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ABSTRACT 

A  computer  decision  aiding  system  for  debiasing  and  combining 
information  from  multiple  sources  is  proposed.  The  algorithm  is 
based  on  six  assumptions  that  apply  when  the  sources  are  rela¬ 
tively  knowledgeable  with  respect  to  the  operator  on  the  variable 
of  interest,  and  the  operator  is  willing  to  base  his  evaluation  of 
their  performance  on  a  previously  selected  (finite)  sequence  of  so 
called  calibration  variables.  It  is  also  assumed  that  the  operator 
is  interested  in  maximizing  gains,  that  is,  he  wishes  to  act  in  an 
optimal  or  Bayesian  manner.  An  experiment  with  two  human 
sources  of  information  was  conducted  to  evaluate  the  perfor¬ 
mance  of  the  aiding  system  under  a  variety  of  loss  funtion.  On  a 
family  of  bilinear  loss  functions,  the  output  of  the  aid  was  found 
to  perform  better  functions  than  a  naive  scheme  like  simply 
believing  the  information  the  two  sources  gave.  The  combination 
rule  was  also  found  to  perform  better  than  the  output  to  any 
individual  source. 


1.  INTRODUCTION 


J 

i 

The  essence  of  the  presence  of  an  operator  in  a  supervisory  control  situation  is  that  he  can  choose 
and  execute  a  control  action.  According  to  the  principles  of  decision  theory  this  preference  the 
operator  has  among  control  action  can  be  conveniently  described  in  terms  of  what  he  knows  about 
his  system  and  what  he  wants  to  achieve,  the  two  being  postulated  to  be  independent  of  each 
other.  What  he  knows,  his  beliefs,  can  be  represented  numerically  in  terms  of  a  "subjective  proba¬ 
bility  function"  over  the  possible  states  of  the  system,  and  what  he  wants,  his  values,  in  terms  of  a 
(subjective)  "utility  function"  or  "loss  function"  over  the  possible  consequences  of  his  actions. 

Thus,  the  operator’s  behavior  can  be  described  in  terms  of  two  notions,  which  in  turn  can  be  ] 

uniquely  represented  by  these  two  functions;  in  much  the  same  way  as,  for  example,  the  behavior 
of  particles  can  be  represented  conveniently  in  terms  of  its  mass  and  the  forces  working  on  which, 
in  turn,  can  be  uniquely  represented  by  real  numbers. 

The  companion  report  by  Charny  and  Sheridan  [1986]  on  a  computer  aiding  technique  for 
satisficing,  investigates  the  aiding  of  the  operator  in  establishing  what  he  wants.  This  report  shall 
be  devoted  to  aiding  the  operator  to  gather  knowledge  about  the  state  of  the  system,  or, 
equivalently,  to  develop  an  accurate  mental  model  of  values  of  the  state  variables  of  the  system. 

The  way  we  obtain  knowledge  is  by  observation.  However,  observing  the  state  of  a  ‘ystem  in  a 
typical  supervisory  control  situation  is  far  less  straightforward  than  it  is  to  check,  say,  whether 
the  grass  is  still  green,  or,  to  take  a  favorite  statistical  paradigm,  to  check  whether  a  coin  has 
landed  with  heads  or  tails  upward.  Observations  in  supervisory  control  situations  usually  come  to 
the  operator  in  an  indirect  way  via  some  source  of  information  like  the  equipment  on  his  control 
panel,  "intelligent"  devices  like  fault  location  systems,  computer  data  bases  or  human  experts.  An 
important  practical  difference  is  that,  if  the  information  comes  to  the  operator  via  an  information 
source,  it  may  be  deformed  or  biased  in  some  way,  whereas  in  the  previous,  direct  case  -barring 
hallucinogenic  states  of  mind-  what  you  see  is  what  you  get.  For  example,  a  source  may  systemat¬ 
ically  overestimate  certain  quantities,  or  it  might  be  systematically  overconfident.  Therefore,  since 
an  information  source  may  not  give  an  accurate  reflection  of  the  state  of  the  system,  the  operator 
needs  to  know  its  "characteristics"  to  make  optimal  use  of  its  information. 

If  the  operator  has  more  than  one  source  at  his  disposal,  the  situation  becomes  even  more 
involved.  Sources  of  information,  like  any  set  of  observations,  are  mutually  redundant  to  some 
degree.  Just  as  we  know  that  a  double  check  of  the  color  of  grass  will  not  add  much  to  our  beliefs, 
the  operator  might  know  that  one  source  typically  duplicates  another  source,  making  the  one 
irrelevant  in  light  of  the  other. 

Therefore,  in  order  to  incorporate  the  information  from  all  the  available  sources  within  his 
knowledge  base,  the  operator  needs  to  know  something  about  the  characteristics  of  the  sources 
individually,  and  as  a  whole.  With  this  he  can  debias  and  weigh  the  information  from  each  source 
to  combine  them,  together  with  his  own  opinions,  into  one  reliable  and  consistent  mental  model  of 
the  system. 
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2.  PROBLEM  STATEMENT 


The  purpose  of  this  paper  is  to  propose  a  computer  aiding  system  that  can,  in  some  well  described 
situations,  take  over  this  task  of  combining  information  from  different  sources.  We  will  develop 
the  mathematical  model  for  the  general  case,  that  is  for  any  number  of  sources.  We  will  also  show 
the  results  of  the  computer  implementation,  that  is  the  actual  aiding  system,  when  applied  to  an 
experimental  setup  with  two  human  sources  of  information.  The  experiment  is  a  tentative  one, 
designed  to  illustrate  the  concepts  and  workings  of  the  theory  and  the  aiding  system.  Full  scale 
experiments  simulating  actual  industrial  situations  are  presently  being  considered  and  will  be 
reported  on  at  a  later  stage.  For  the  experimental  results,  we  then  show  that  under  a  large  class 
of  loss  functions  -i.e.  no  matter  what  he  wants-  the  operator  is  better  off  (saves  money)  believing 
the  output  of  the  aid  than  simply  accepting  the  advice  of  the  sources. 

The  way  an  operator  would  actually  combine  information  from  different  sources,  or,  more 
appropriately,  how  he  should  do  it,  is  a  very  intricate  affair  involving  many  factors.  Of  course  we 
cannot  hope  to  automate  this  procedure  in  general.  What  we  can  (and  should)  do  is  to  state  a 
number  of  plausible  and  operational  assumptions  under  which  an  optimal  solution  to  this  problem 
exists.  If  the  operator  agrees  that  the  assumptions  accurately  describe  his  particular  situation,  he 
can  let  the  computer  aiding  system  take  over  the  task. 

For  the  aid  to  be  useful,  the  assumptions  have  to  apply  to  frequently  occurring  and  interesting 
situations.  As  will  be  made  precise  in  the  course  of  this  manuscript,  the  aiding  system  based  on 
these  assumptions  pertains  to  situations  in  which  the  information  sources  are  knowledgeable  with 
respect  to  the  operator  himself  on  the  states  on  which  he  seeks  advice.  As  information  normally 
costs  something  (money  or  trouble  or  any  kind  of  loss),  this  situation  is  fairly  common;  if  the 
operator  thinks  he  knows  more  than  his  information  sources,  he  need  not  bother  with  them  at  all. 

All  in  all  we  will  put  forward  six  assumptions.  To  cast  the  problem  into  a  numerical  framework  it 
is  necessary  to  make  three  "form"  assumptions.  The  first  form  assumption,  Fl,  will  impose  some 
necessary  structure  on  the  state-space  and  the  operator.  The  second  form  assumption,  F2,  fixes  a 
standard  way  of  describing  the  information  flow  from  the  source  to  the  operator.  Form  assump¬ 
tion  three,  F3,  then  introduces  a  numerical  way  of  measuring  the  characteristics  of  the  sources,  or 
the  process  of  calibration.  We  then  proceed  to  carry  through  the  necessary  inference  steps.  To 
accomplish  this  we  introduce  three  "inference  assumptions".  II  fixes  a  starting  point,  that  is  a 
(prior)  set  of  beliefs  when  the  operator  has  as  yet  no  data  pertinent  to  the  characteristics  of  the 
sources.  12  determines  the  relation  between  the  calibration  measurements.  13,  finally,  fixes  how 
sensitive  the  starting  point  in  II  is  to  these  measurements.  These  assumptions  are  sufficient  to 
guarantee  the  existence  of  a  optimal  solution,  i.e.  a  (posterior)  set  of  beliefs  over  the  states  of  the 
system  given  the  information  from  the  information  sources  and  their  characteristics  as  evidenced 
by  the  calibration  data. 

The  paper  is  organized  as  follows;  Chapter  3  introduces  the  three  form  assumption  and  chapter  4 
the  three  inference  assumptions.  Chapters  5  and  6  present  the  solution  to  the  combination  algo¬ 
rithm  that  results  from  these  assumptions  and  presents  the  results  when  the  model  is  applied  to 
the  experimental  setup.  Chapter  8  contains  the  evaluation  of  the  model,  i.e.  the  decision  aid,  and 
chapter  8  rounds  everything  off  with  a  summary  and  conclusions.  The  setup  and  results  of  the 
experiment  are  contained  in  appendices  1  to  3.  Appendix  4  contains  some  derivations  that  are  left 
out  in  the  main  text  in  order  not  to  distract  the  attention  from  the  main  argument.  Appendix  5 
contains  a  list  of  symbols  used  in  the  course  of  the  document. 
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3.  ANALYSIS  AND  NOTATION. 


3.1  Fl:  Form  of  the  system  and  the  operator 

i 


At  a  particular  moment  in  time,  the  system  can  be  viewed  to  be  in  any  of  a  number  of  states,  in 
other  words,  anywhere  in  state-space.  In  order  for  the  remaining  assumptions  to  make  sense,  and 
for  plain  convenience  we  shall  impose  some  structure  on  this  space.  The  only  structure  that  the 
other  assumptions  require  is  a  topological  one,  namely  that  the  state-space  is  a  simply  ordered 
set.1  This  assumption  is  rather  minimal,  and  I  do  not  think  any  practical  system  of  interest  exists 
for  which  this  condition  does  not  hold.  In  fact,  the  following,  much  stronger  condition  holds  so 
frequently  that  we  shall  assume  that  there  exists  a,  for  all  practical  purposes,  real-valued  random 
variable  defined  on  the  state-space,  which  we  call  x0,  on  the  outcome,  x0,  of  which  the  operator 
bases  his  decisions.2  This  situation  occurs  so  often  that  I  believe  that  the  loss  in  generality  is 
amply  counterbalanced  by  the  gain  in  familiarity  and  ease  of  expression. 

The  experiment  involved  fifty-three  (real-valued)  variables  taken  out  of  the  Guinness  book  of 
world  records  and  selected  for  their  pertinence  to  mechanical  engineering  (see  appendix  1).  The 
maximum  design  sway  of  the  Eiffel  tower,  given  its  length  (question  number  47)  is  an  example  of  a 
random  variable  defined  on  the  space  of  extreme  states  of  the  Eiffel  tower.  Similarly,  the  devia¬ 
tion  from  the  parallel  of  the  towers  of  the  Humber  Estuary  bridge  (question  48)  is  a  variable 
defined  on  the  states  of  that  bridge,  and  so  for  the  capacity  of  the  aqueduct  of  Carthage  (question 
49)  etc.  etc. 

If  we  plan  to  aid  the  operator,  we  will  have  to  assume  that  there  are  some  basic  principles  of  logic 
the  operator  wants  to  adhere  to.  To  this  end,  we  will  assume  that  the  operator  wishes  to  be  a 
Bayesian;  specifically,  that  he  wishes  to  comply  to  Savage’s  [1954]  axioms.  These  axioms  consist  of 
the  minimal  equipment  to  ensure  consistency,  or  coherency  is  it  is  more  often  referred  to,  of  the 
operator’s  behavior.  As  mentioned  previously,  they  also  allow  us  to  describe  the  operator’s 
behavior  in  terms  of  (subjective)  probabilities  and  losses,  where  the  operator  will  wish  to  choose 
the  action  that  carries  the  least  expected  loss.  (This  part  of  Fl  justifies  that  reasoning  in  the 
introduction.)  It  is  in  this  sense  that  the  decision  aid  will  be  optimal.  Thus  Fl  ensure  optimality 
of  the  model  with  respect  to  any  loss  function.  This  does  not  mean  that  the  aid  cannot  be  wrong. 
What  is  does  mean  is  that  there  is  no  systematic  way  in  which  it  can  lose  independent  of  the 
actual  state  of  the  system.  It  can  be  shown  (see  de  Finetti  [1974])  that  any  other  set  of  axioms  can 
be  used  as  a  "money  pump",  i.e.  there  exist  a  systematic  way  in  which  it  can  lose  independent  of 
the  actual  state  of  the  system.  For  the  supervisory  control  situation  these  properties  are  obviously 
extremely  desirable. 

Since  the  first  part  of  Fl  allows  us  to  parametrize  the  state-space  by  x~0,  and  the  second  part 
allows  us  to  summarize  the  relevant  aspects  of  the  operator’s  mental  model  of  the  system  by  pro¬ 
babilities,  we  can  write  the  operator  mental  model  as  a  distribution  over  x0  given  all  his 
knowledge  (except  the  information  from  the  sources  and  the  data  about  their  characteristics).  We 
comply  with  tradition  by  assuming  the  conditioning  on  his  knowledge  to  be  implicit  and  simply 
write  F(z0).  This  distribution  is  called  the  a  priori  predictive  distribution.  Since,  in  this 


1.  A  relation  <  is  called  a  simple  ord'-ing  (or  a  linear  order,  a  weak  order  or  an  order  relation)  on  a  set  S  if  (and  only  if) 
for  all  x,  y,  and  z  in  S 

1.  (comparability)  Either  x  <  y,  or  y  <  x, 

2.  (transitivity)  If  x  <  y,  and  y  <  z,  then  x  <  z. 

2.  We  distinguish  between  random  variables  and  their  outcomes  by  providing  the  former  with  a  tilde. 


document,  no  confusion  is  possible  we  shall  simply  call  it  the  operator’s  prior  (on  x0).  3 4 

Now  suppose  that  the  operator  obtains  information  from  k  different  sources,  which  we  label 
A.B,  .  .  .  ,K .*  Suppose  also  that  the  operator  has  some  idea  about  the  reliability  of  sources 
A,  .  .  .  ,K.  His  mental  model  of  the  system  can  now  be  no  longer  represented  by  F(x'0)  but  should 
be  represented  by  his  distribution  over  xQ  given  the  information  and  the  characteristics  of  its 
sources,  or, 

F(x0  |  Information  from  A,  .  .  .  ,K,  Characteristics  of  A,  .  .  .  ,K)  (3.1) 

Let  us  call  this  distribution,  the  predictive  distribution  a  posteriori,  the  posterior  (on  x'0). 

Thus  it  will  be  our  goal  to  evaluate  (3.1).  To  accomplish  this  we  still  need  to  perform  some 
analysis.  Indeed,  the  vague  terms  Information  from  A,  ...  ,K  and  Characteristics  of  A,  ...  ,K 
need  to  be  replaced  by  precise  numerical  concepts.  This  will  be  covered  by  F2  and  F3,  respec¬ 
tively. 

Although  (3.1)  will  give  the  operator  a  complete  representation  of  the  relevant  beliefs  he  need 
have,  it  is  not  at  all  obvious  that  he  is  able  to  use  (3.1)  effectively.  We  will  not  discuss  this  deli¬ 
cate  issue  here,  but  refer  the  reader  to  the  companion  report  by  Roseborough  and  Sheridan  [1986] 
where  this  is  investigated  at  large,  both  theoretically  and  experimentally. 


3.2  F2:  Form  of  the  information 

Ideally,  each  source  of  information  should  give  its  entire  probability  distribution  over  the  variable 
of  interest,  x0.  Obviously,  and  this  especially  so  if  the  sources  are  humans,  this  is  not  feasible. 
Instead  we  will  assume  that  each  source  is  capable,  possibly  in  a  noisy  and/or  systematically 
biased  way,  to  give  some  points,  say  m  of  them,  of  its  distribution.  The  continuous  case  should 
then  be  viewed  as  a  limiting  case  as  m— *oo.  For  our  purposes  it  will  prove  to  be  most  convenient 
to  let  these  points  be  a  source’s  fractiles  of  its  distribution.5 

Thus,  suppose  we  can  elicit  m  numbers  from  each  source  which  the  source  calls  its  fractiles  to  pro¬ 
bability  values  that  have  been  preset.  To  keep  track  of  these  numbers  we  shall  order  them  in 
increasing  magnitude  and  label  them  with  the  lower  case  of  the  source  in  question.  In  other 
words,  we  define  for  source  A  a  (row)  vector,  or  ordered  set  a  =  (a,,  ....  am)  that  contains  A's  m 
fractiles  on  the  variable  x'0.  Similarly  for  source  B  to  K. 

If  there  is  an  m  for  which  the  above  scheme  works,  we  can  replace  Information  from  A,  ...  ,K  in 
(3.1)  by  the  ordered  sets  a,  ...  ,k  that  contain  the  sources’  fractiles.  In  other  words,  expression 
(3.1)  can  now  be  written  as 

F(x0|o,  .  .  .  ,  Jfc,  Reliability  of  A,  .  .  .  ,K).  (3.2) 

The  next  question  is  a  practical  one:  how  many  and  which  fractiles  can  one  reasonably  ask  of  an 
information  source?  Of  course  the  answer  will  depend  on  the  nature  of  the  source.  Most 

3.  The  notation  F(  )  is  used  for  any  (cumulative)  distribution,  i.e.,  any  marginal  distribution  of  the  distribution  over  all 
states  of  the  system.  Densities  will  be  represented  by  /(■)  and  the  measure  itself  by  p(  |. 

4.  More  precisely,  it  is  the  states  of  information  of  the  sources  that  are  labeled  such. 

5.  An  a— fractile  of  a  distribution,  F(*0)  is  a  value  of  iQ  such  that  the  probability  that  the  true  value  is  below  that  valu- 
equals  a,  or,  simply  F-1(a)  . 
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equipment  gives  a  single  reading  that  represents  its  mean  or  its  median  and  the  distribution 
around  this  number  is  provided  by  the  manufacturer.  In  this  case  an  infinite  number  of  fractiles 
are  available.  For  human  experts  or  expert  systems  the  horn  is  a  little  less  plenty.  Often  human 
experts  give  their  advice  in  terms  of  a  single  estimate  which  we  can  consider  to  be  his  median. 
Just  one  fractile  seems  a  bit  meager,  but  if  k  is  not  too  small  -if  there  are  enough  experts-  this 
kind  of  information  will  still  be  useable.  A  more  reasonable  approach  is  to  ask  the  expert  to  give 
an  estimate,  and  let  that  be  the  median  of  his  distribution,  and  also  an  80%  confidence  or  credible 
interval  around  that  estimate.  In  fractile  words,  the  source  gives  what  it  claims  are  its  .1-,  .5-  and 
.9-fractiles.  Here  we  have  of  course  that  a=(o,,  a2.  a3)>  ai  being  the  ,1-fractile  and  so  on.  Simi¬ 
larly  source  B' s  information  is  embedded  in  b  =  (6If  b 2,  63).  The  situation  is  portrayed  in  figure 
3.1  for  some  distribution  that  source  A  might  have. 

F(*0|A) 


Figure  3.1.  Hypothetical  fractiles  a  =  (a,  a2_  a3)  of  source  A  for  the  values  set 
in  the  experiment. 

This  was  the  method  that  was  actually  employed  in  the  experiment.  Two  human  subjects,  source 
A  and  source  B,  had  to  provide  their  .1-,  5-  and  .9-fractiles  on  the  fifty-three  random  variables  in 
appendix  1.  The  experiment  was  designed  to  simulate  the  situation  in  which  experts  advise  an 
operator  as  closely  as  could  be  expected  within  the  confines  of  this  tentative  experiment.  The  sub¬ 
jects,  both  Ph.D.  candidates  in  Mechanical  Engineering  at  MIT,  were  therefore  allowed  to  use  all 
standard  engineering  equipment  like  desk  calculators.  Of  course  they  were  not  allowed  to  simply 
look  up  the  true  value. 


3.3  F3:  Characteristics  and  calibration 

Of  course  it  is  one  thing  asking  for  someone’s  fractiles,  and  another  thing  to  actually  get  them. 
Even  assuming  that  the  source  can  introspect  sufficiently  to  give  its  fractiles.  there  may  still  be 
reason  to  doubt  the  unbiasedness  of  its  opinions,  or  to  which  degree  it  is  redundant  in  the  light  of 
another  available  information  source.  We  will,  in  this  chapter,  propose  a  way  to  measure  the 
characteristics  of  the  sources  no  matter  what  their  cause  is.  To  justify  some  of  the  choices  made 
in  this  chapter,  we  need  anticipate  some  of  the  results  of  the  following  chapters,  hoping  that  the 
reader  will  be  willing  to  temporarily  accept  some  statements  as  reasonable. 

The  way  to  determine  the  characteristics  of  a  source  of  information  is  to  compare  its  opinions 
about  the  value  of  a  variable  to  the  true  value  of  that  variable.  A  little,  but  not  much  more 


precisely  said,  we  want  to  measure  the  "distance"  between  the  opinions  and  the  true  value.  Of 
course,  after  one  variable  we  cannot  say  much;  the  source  may  have  been  particularly  lucky  or 
unlucky.  However,  after  a  large  number  of  variables  to  which  we  compare  the  advice  and  the  true 
values,  it  is  not  unreasonable  to  expect  to  converge  to  some  belief  about  the  characteristics  of  the 
source,  especially  if  these  variables  are  chosen  from  some  area  of  expertise  of  the  source. 

We  will  first  propose  a  way  of  measuring  "distances”  between  the  fractiles  of  a  source  on  a  variable 
and  its  true  value.  Suppose  then  that,  at  the  time  the  operator  obtains  the  advice  on  x0,  he  has  n 
variables  to  he  not  only  has  the  sources’  advice,  but  to  which  he  also  has  the  true  values.  We  shall 
call  this  kind  of  variables  calibration  variables.  To  distinguish  these  from  x0,  the  decision  variable 
at  that  moment  whose  true  value  he  naturally  does  not  possess,  we  shall  denote  them  by 
x'l,  •  .  .  ,xn.  In  practice  these  variables  can  either  be  selected  beforehand  such  that  the  operator 
knows  their  true  values,  maybe  by  putting  the  sources  on  a  simulator  of  the  system,  or  they  could 
have  been  previous  decision  variables  to  which  the  true  value  has  meanwhile  become  known;  if  a 
source  predicts  the  Dow  Jones  index  for  tomorrow,  than  tomorrow  this  value  will  be  known.  In 
the  latter  case  the  system  gradually  learns  the  characteristics  of  the  sources.  Naturally,  combina¬ 
tions  of  these  approaches  can  also  be  considered. 

Concerning  the  division  of  the  experimental  data  into  decision  and  calibration  variables,  we  can 
imagine  that  each  of  the  fifty-three  variables  can  be  viewed  as  a  decision  variable,  with  the  preced¬ 
ing  variables  being  used  as  calibration  variables,  i.e.  at  variable  number  n+1  we  assume  that  the 
values  to  the  previous  n  are  meanwhile  known.  This  is  borne  out  in  the  legends  to  the  graphs  in 
appendix  3.  At  the  first  variable,  there  are  no  calibration  variables;  if  x0  is  the  first  variable  then 
n=0.  In  general,  when  x0  is  the  n+l‘*  variable,  then  the  operator  has  the  previous  n  variables  as 
calibration  variables  at  his  disposal.  Thus  the  experiment  can  be  viewed  as  a  sequential  operation 
where  at  each  subsequent  variable  the  operator  and  his  aiding  system  have  one  more  calibration 
variable  at  their  disposal  to  determine  the  characteristics  of  the  sources  more  precisely. 

Now  their  are  of  course  many  ways  of  defining  the  distance  between  the  advice  on  a  variable  and 
its  true  value.  From  form  considerations  alone,  however,  the  following  can  be  said.  Consider 
thereto  the  variables  in  the  experiment.  One  variable  is  measured  in  miles  per  hour,  another  in 
degrees  Kelvin  etc.  If  we  wish  to  make  any  kind  of  comparison  between  these  distances,  we  have 
to  introduce  some  way  of  measuring  distance  that  is  independent  of  the  particular  scale  the  vari¬ 
able  itself  is  measured  in.  One  way  to  do  this  is  to  simply  note  between  which  fractiles  the  true 
value  falls.  No  matter  how  the  scale  is  changed,  this  statistic  will  be  invariant.  If  we  obtain  three 
fractiles  from  each  source,  then  the  true  value  can  fall  in  any  of  four  interfractile  ranges,  or  bins  as 
we  shall  call  them:  the  true  value  is  either  smaller  than  ax  ,  or  between  a,  and  a2,  or  between  a2 
and  a3,  or  larger  than  a3.  We  shall  call  the  first  interfractile  range  the  zeroth  bin,  the  second  the 
first  bin  and  so  on  to  the  third  bin. 

For  notational  purposes  it  is  convenient  to  introduce,  for  every  variable  x,  ( *=0,1 . n)  a  new 

random  variable  ij-  that  indicates  in  which  bin  the  true  value  has  fallen  or  might  fall.  The  way  in 
which  this  is  actually  done  is  chosen  purely  for  mathematical  convenience  and  need  not  concern  us 
here  (see  appendix  4).  We  shall  call  r~-  the  calibration  indicator  for  variable  x],  The  importance  of 
the  calibration  indicators,  the  z],  is  that  they  have  the  same  scale  for  each  «,  although  the  scales  of 
the  corresponding  x,  may  be  totally  incommensurable.  This  fact  is  essential  for  the  inference  as 
exposed  in  chapter  4. 

When  we  have  two  sources,  we  have  ((3+l)  (3-(-l)  =  )  sixteen  different  bins  and  the  calibration 
indicators  can  be  generalized  accordingly.  Let  us  agree  to  number  the  bins  of  the  double-source 
case  vdth  two  digit  numbers,  the  first  indicating  in  which  (marginal)  bin  of  source  .4  the  true 
value  falls,  and  the  second  in  which  (marginal)  bin  of  source  B.  We  reserve  the  letter  j  to 
represent  a  generic  bin  number.  For  notational  convenience,  we  shall  think  of  j  as  a  k- 
dimensional  vector,  /=[/,,  .  .  .  ,  ;*],  having  an  element  for  each  source  indicating  in  which 
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(marginal)  bin  of  that  source  the  true  value  fell.  Thus,  for  the  two-source  case,  if  the  true  value 
falls  in  A’s  zeroth  bin  and  in  B’s  second  bin,  we  will  say  it  fell  in  bin  j—[ja,j i*]=[0,2].  Before  we 
state  F3  and  proceed  with  the  inference  assumptions,  we  will  first  discuss  intuitively  why  this  way 
of  measuring  distance  using  calibration  indicators  is  reasonable  from  an  inferential  point  of  view, 
that  is,  what  we  may  hope  to  expect  from  it  when  using  the  inference  assumptions  to  update  the 
operator’s  distribution  over  x0. 

If  a  source  were  unbiased  we  would  expect  that,  in  the  long  run,  10%  of  the  true  values  would  fall 
in  the  zeroth  and  10%  in  the  third  bin,  since  the  source  assigns  probability  .1  to  these  bins.  Simi¬ 
larly  40%  of  the  true  values  should  fall  in  the  first  and  40%  in  the  second  bin.  A  source  for  which 
this  is  so  is  called  well-calibrated,  a  rather  unusual  state  of  affairs.  Suppose  now  that,  for  large  n, 
most  of  the  true  values  fall  in  the  zeroth  and  third  bin.  Apparently,  when  the  source  says  the 
events  are  unlikely  (probability  .1),  they  occur  far  too  often,  and  when  it  says  the  events  almost 
surely  happen  (probability  .9:  zeroth,  first  and  second  bin  together)  they  do  not  happen  often 
enough.  Such  a  source  can  be  called  overconfident  and  we  would,  on  the  basis  of  this  information, 
like  to  correct  for  this  overconfidence  when  determining  the  posterior  on  x0  by  making  his  distri¬ 
bution  proportionally  "flatter".  Similarly,  a  source  can  be  underconfident  by  having  too  many 
true  values  fall  in  the  middle  two  bins.  If  too  many  true  values  fall  in  the  zeroth  and  first  bin  we 
would  say  that  the  source  consistently  underestimates  the  magnitude  of  the  quantities,  also  some¬ 
thing  one  could  compensate  for,  etc.  etc.  Thus,  in  the  single-source  case  the  sequence  of  calibra¬ 
tion  indicators  should  be  capable,  under  appropriate  conditions,  to  identify  a  source’s  biases. 

When  the  operator  has  two  sources,  A  and  B,  at  his  disposal,  the  calibration  indicators  also  meas¬ 
ure,  in  a  specific  sense,  the  degree  of  interdependence  between  the  two  sources.  It  may  be  the  case 
that,  when  A  underestimates,  B  consistently  overestimates.  By  measuring  at  the  same  time  in 
which  of  A’s  and  B’s  bin  a  true  value  falls,  the  sequence  of  calibration  indicators  can  thus  also 
indicate  the  interdependence  between  the  two  sources,  that  is,  measure  the  characteristics  of  A  and 
B  as  a  group  or  panel  of  information  sources. 

Of  course,  the  amount  of  different  biases  and  interdependencies  -i.e.  characteristics-  that  can  be 
identified  is  potentially  infinite  and  we  will,  therefore,  not  attempt  to  do  this.  The  important 
point  is  that,  at  least  intuitively  speaking,  whatever  the  biases  and  interdependences  may  be,  they 
will  be  partially  embedded  in  the  particular  sequence  of  calibration  indicators,  .  .  .  ,z„,  and 
this  partiality  will  become  more  and  more  complete  as  n  grows.  These  considerations  suggest 
replacing  Characteristict  of  A,  ...  ,K  in  (3.2)  by  a  sequence  of  calibration  indicators,  ,z„. 

This  sequence  should  enable  the  aid  to  correct  for  all  the  existing  biases  as  much  as  possible  for  a 
given  value  of  n.  In  the  following  we  will  indeed  assume  that  we  can  do  this,  and  thus  write  for 
(3.2) 

B(xq  | a,  •■.,*,  «i,  •  ■  •  ,*«).  (3.3) 

This  assumption  entails  that  the  operator's  knowledge  about  the  characteristics  of  the  sources  con¬ 
sists  solely  of  the  sequence  of  n  calibration  indicators.  Often  this  will  be  approximately  true,  or  at 
least  that  the  operator  is  so  insecure  about  his  personal  evaluations,  that  he  is  willing  to  disregard 
them  for  all  but  very  small  values  of  n.  In  other  cases  the  operator  will  have  to  express  his  extra- 
calibration-indicator-sequence-knowledge  in  terms  that  are  compatible  with  the  calibration  indica¬ 
tor  sequence  to  incorporate  this  knowledge  with  the  evidence  from  the  calibration  indicators. 
Some  more  will  be  said  on  this  in  appendix  4.  Meanwhile,  we  will  assume  that  from  now  on  (3.3) 
represents  the  appropriate  posterior. 

It  should  be  clear  that  calibration  does  not  measure  how  much  or  little  a  source  knows,  but  how 
well  it  knows  what  it  knows,  or  better,  how  well  it  can  state  what  it  knows.  If  a  source  does  not 
know  much  about  a  certain  variable  it  should  choose  its  fractiles  far  enough  apart  to  reflect  its 
ignorance.  If  it  thinks  it  knows  a  lot,  it  should  choose  its  fractiles  close  enough  together.  Too 
close  will  make  the  source  overconfident,  too  far  apart  will  make  it  underconfident.  W  hcther 


knowledgeable  or  ignorant,  in  general  or  on  a  particular  variable,  a  source  can  be  well-calibrated. 
An  important  underlying  idea  of  the  above  reasoning,  that  will  be  made  precise  by  12,  is  that 
knowing  what  one  knows  is  something  that  is  relatively  constant  for  each  source  and  can  be  meas¬ 
ured  by  determining  the  calibration  indicators  for  many  different  variables,  i.e.,  we  expect  our  and 
the  operator’s  beliefs  concerning  the  characteristics  of  the  sources  to  converge,  as  n  grows,  to  the 
true  biases  of  the  sources. 


4.  INFERENCE 


The  previous  section  contained  a  lot  of  intuitive  reasoning  with  many  inherent  assumptions  like 
the  existence  and  meaningfulness  of  hit-rates,  convergence  of  the  operator’s  beliefs  and  so  on. 
Moreover,  the  arguments  were  based  on  the  long  run,  i.e. ,  for  infinite  n,  which  of  course,  strictly 
speaking,  never  applies  in  practice.  Indeed,  especially  for  human  sources,  calibration  data  are 
difficult  to  obtain,  so  solutions  for  the  finite  case  must  also  be  available.  (Even  if  a  large  number 
of  calibration  indicators  are  available,  the  question  would  still  remain  which  number  is  large 
enough.)  We  shall  now  state  three  assumptions  which  will  provide  the  operator  -if  he  finds  them 
acceptable  for  his  situation-  with  a  unique  posterior  for  any  number  n  of  calibration  questions. 
The  assumptions  will  also  be  seen  to  justify  the  intuitive  reasoning  of  the  previous  section. 

From  the  point  of  view  of  statistical  inference,  the  posterior  (3.3)  can  be  analyzed  in  terms  of  two 
separate  updatings:  one  on  the  information  from  the  sources,  and  one  on  the  characteristics  of  the 
sources.  The  actual  inference  will  not  be  done  on  x0,  but  on  the  corresponding  calibration  indica¬ 
tor,  z0.  The  corresponding  distributions  on  xQ  will  then  be  derived  from  these.  Schematically  we 
shall  pass  through  the  following  inference  steps 


F(*  o) 

-*• 

F(z0 

\*i>  ■  ■  ■  .*») 

i 

1 

F(z0  |a,  .  . 

•  ,*)  - 

F(z0  |a,  •  • 

•  ,k,  *i,  •  •  •  ,«,) 

1 

1 

F(*0 |a,  ■  ■ 

••*) 

F(*0  |o,  .  . 

■  ,*,  Z\,  ■  ■  ■  ’Zn) 

Figure  4.1.  Schema  of  the  steps  involved  in  the  inference  to  the  posterior, 
F(i0  ja,  .  .  .  ,k,  zt,  .  .  .  ,zn).  The  small  arrow  indicates  an  alternate  route. 


11  will  concern  the  form  of  F(zQ)  and  how  to  obtain  from  it  F(z0  |a,  .  .  .  ,k)  and  F(i0  |a,  .  .  .  ,h), 
the  distribution  over  the  decision  indicator  and  variable,  respectively,  after  the  operator  has 
obtained  the  information  from  the  sources  but  has  no  information  pertinent  to  their  reliability. 
The  remaining  two  assumptions  will  be  concerned  with  the  conditioning  on  zu  .  .  .  ,zn  and  the 
subsequent  conditioning  on  a,  .  .  .  ,k. 

4.1  11:  Prior  beliefs 


If  one  wants  to  get  somewhere,  one  has  to  start  off  somewhere.  Thus,  if  we  want  to  calculate  a 
posterior  distribution,  we  first  have  to  establish  a  prior  distribution.  The  starting  points  in  the 
above  schema  are  F(x 0)  and  F(z0),  distributions  which  are  neither  conditioned  on  the  information, 
nor  on  any  outcomes  of  calibration  indicators.  For  lack  of  better  beliefs  on  the  characteristics  of 
the  sources,  owing  to  F3,  it  seems  reasonable  to  give  the  sources  the  benefit  of  the  doubt  and  to 
simply  start  off  believing  its  statements.  In  other  words,  the  operator  assumes  the  sources  are 
well-calibrated.  The  operator  can  then  rely  on  the  conditioning  on  the  calibration  indicators  to 
disprove  the  source  if  necessary.  If  nothing  else,  the  principle  appeals  to  some  notion  of  fairness, 
somewhat  reminiscent  of  common  practice  in  courtrooms  where  the  accused  is  assumed  innocent 
until  proven  otherwise  by  the  evidence 

If  the  operator  has  one  source  at  his  disposal,  it  is  clear  what  this  assumption  means,  namely  that 
F(z0)  is  simply  equal  to  the  probabilities  to  which  the  source  gave  his  fractiles;  in  the  experiment: 
.1,  .5,  .9  for  the  zeroth  through  the  third  bin.  For  the  multi-source  case,  the  interpretation  of  the 
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assumption  has  to  be  extended  a  little.  Suppose,  as  in  the  experiment,  that  the  operator  has  two 
sources  at  his  disposal.  As  discussed  under  F3,  this  results  in  a  total  of  sixteen  bins  which  we  can 
arrange  in  the  following  matrix 
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Figure  4.2.  Prior  bin  probabilities  in  the  double-source  four-fractile  case,  as  in 
the  experiment. 


The  bin  numbers  (the  values  of  j)  are  shown  in  small  print  and  the  probabilities  in  normal  print. 
The  single-source  interpretation  fixes  the  marginal  bin  probabilities  which  are  shown  at  the  sides 
of  the  table.  Of  course  these  are  not  enough  to  specify  all  sixteen  bin  probabilities.  We  will  take 
the  statement  of  II  to  mean  also  that  the  operator  considers  the  sources,  a  priori,  to  be  "indepen¬ 
dent"  in  the  sense  that  the  probability  of  the  true  value  falling  in  a  certain  bin  of  A  and  a  certain 
bin  of  B  is  the  product  of  the  marginal  bin  probabilities.  For  example,  the  prior  probability  that 
the  true  value  falls  in  the  zeroth  bin  of  both  A  and  B  is  .1  •  .1  =  .01.  Extensions  to  the  Ar-source 
case  are  obvious. 

The  side  step  is  to  condition  F(z0)  on  the  information  from  the  sources  to  obtain  F(z0  |a,  .  .  .  ,k). 
In  the  single-source  case  this  is  straightforward.  Since  a,  .  .  .  ,k  by  F3  does  not  contain  any  infor¬ 
mation  about  the  characteristics  of  the  sources,  it  follows  that,  for  instance,  F(z0)  =  F(z0|a). 
Appendix  4  contains  a  more  thorough  discussion.  In  the  multi-source  case,  however,  things  are 
slightly  less  straightforward.  The  problem  is  that,  given  the  advice,  not  all  bins  remain  possible 
under  any  reasonable  F(x0).  If,  for  instance,  at  is  smaller  or  equal  than  it  cannot  be,  at  the 
same  time,  larger  than  6(.  It  follows  that,  given  the  advice,  at  most  (3+3+1=)  seven  of  the  origi¬ 
nal  sixteen  bins  remain  (and  at  least  four  if  the  a,-  all  coincide  with  the  b{).  The  appropriate  condi¬ 
tioning  can,  here  be  achieved  by  simply  renormalizing  the  appropriate  probabilities.  This  is 
expanded  on  in  more  detail  in  appendix  4. 

The  distribution  F(z0|a,  .  .  .  ,Ar)  constrains  the  distribution  F(j0i<*>  •  •  •,£)  by  prescribing  the 
total  amount  of  probability  mass  on  the  interfractile  ranges  or  the  bins.  For  instance,  .4  of  the 
probability  should  be  assigned  between  ax  and  a2.  Their  are  a  several  ways  to  fix  the  shape  of 
F(z0  | a,  ...,&)  on  the  bins  itself.  In  the  experiments  we  simply  assumed  that  this  distribution  is 
uniform  over  the  bins.  In  appendix  3-A  these  densities  are  shown  by  a  dashed  line  for  the  case 
that  only  source  A  is  available,  F(z0|a),  and  appendix  3-B  shows  the  same  when  B  is  the  only 
available  source,  F(x0  |6).  Appendix  3-AB  shows,  again  in  dashed  lines,  F(z0  !<>,&)• 


4.2  12:  order  of  the  calibration  indicators 


In  chapter  5  there  were  a  several  references  to  the  relative  frequency  of  true  values  falling  in  a  cer¬ 
tain  bin  or  the  hit  rate  of  that  bin.  Let  y;  be  the  number  of  true  values  falling  in  bin  Then  the 
these  references  imply  that  the  ratios  of  the  y;-  to  n  are  all  that  is  needed  of  the  entire  sequence  of 
calibration  indicators  zu  ...  ,zn  to  determine  a  source’s  calibration.  Statisticians  would  say  that 
these  ratios  are  considered  to  be  sufficient  statistics  for  the  sequence  of  calibration  indicators. 

We  will  actually  make  a  somewhat  weaker  assumption,  namely  that  the  counts  y;  are  sufficient 
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statistics  for  the  sequence.  (Clearly  the  ratios  can  be  constructed  from  these  counts.  Thus  if  the 
counts  are  sufficient,  the  ratios  are  sufficient  but  not  vice  versa.)  A  logically  equivalent,  but  intui¬ 
tively  more  appealing,  statement  of  this  is  that  we  -and  hopefully  the  operator  with  us-  consider 
the  order  in  which  the  calibration  indicators  appear  to  be  irrelevant.8  In  statistics,  such  variables 
are  known  as  "exchangeable"  random  variables,  since  one  may  be  exchanged  for  another  (of  course 
prior  to  obtaining  the  outcome)  without  changing  the  probabilities  and,  therefore,  the  decisions. 

The  assumption  of  exchangeability  is  a  very  typical  assumption  when  one  is  measuring  a  quantity. 
When  one  flips  a  coin  to  determine  the  limiting  frequency  of  say  heads,  when  one  makes  multiple 
measurements  of  the  same  constant  quantity  and  so  on,  one  assumes  that  the  order  in  which  the 
outcomes  will  appear  is  irrelevant;  a  simple  count  of  the  total  number  of  heads  and  tails  is  con¬ 
sidered  to  be  fully  adequate. 

As  a  direct  notational  result,  we  can  now  replace  the  sequence  of  calibration  indicators, 
z{,  .  .  .  ,z„,  by  the  counts  of  the  hits  in  the  different  bins.  For  the  single-source  case  with  four 
different  bins  it  is  convenient  to  define  the  vector  y  =  (y0,  •  •  •  ,  y3]  (see  also  appendix  4).  In  the 
two-source  case  we  can  define  y  to  be  a  (m+l)(m  +  l)-matrix,  which,  for  m= 3  is  of  the  same 
dimensions  as  the  matrix  in  4.2.  We  will  use  the  same  subscripts  for  the  elements  of  y.  Thus  y^ 
represents  the  number  of  true  values  that  fell  in  A’s  and  B's  zeroth  bin.  In  the  general  case,  y 
becomes  a  it-dimensional  array.  The  details  are  in  appendix  4.  For  the  present  purposes  it  is 
sufficient  to  realize  that  the  notation  y  represents  that  the  counts  of  the  true  values  falling  in  the 
(m+1)*  bins.  The  posterior  (3.3)  can  now  be  shortened  with  this  notation  as  follows 

F(z0  |  a,  .  .  .  y).  (4.1) 

and  similarly  for  all  other  distributions  containing  Z\,  ■  ■  ■  ,  V  Appendices  3-A  and  3-B  contain 
the  two  single-source  cases.  The  counts  that  are  available  at  the  time  that  information  on  that 
particular  decision  variable  is  obtained  are  shown  underneath  the  graphs. 

Although  the  assumption  of  exchangeability  seems  innocuous  enough,  some  surprisingly  strong 
consequences  exist.  First  of  all,  if  one  calibration  indicator  can  be  exchanged,  a  priori,  for 
another,  it  follows  that  the  operator’s  prior  on  the  indicators  must  be  equal.  Mathematically,  let 
z,  and  z(  be  any  two  calibration  indicators,  *,/=l,....,53,  and  pretend  that  the  operator  does  not 
know  their  outcomes  yet  but  assumes  12,  then 

P(*,)  =  P(*i)-  (42) 

11  effectively  takes  care  of  this  equality.  It  states  not  only  that  they  are  equal,  but  also  to  which 
distribution  (the  sources’  claims).  Using  the  same  reasoning  it  also  follows  that  the  conditional 
probabilities  of  the  z,-,  the  calibration  indicators,  are  equal.7  In  particular,  let  zj  and  zj  be  any  two 
calibration  indicators.  By  exchangeability  we  thus  have, 

p(*.  I»)  =  p(*i  |y).  (4-3) 

saying,  in  words,  that  the  posteriors  on  calibration  indicators  are  equal,  given  equal  calibration 
data.  Therefore,  assuming  exchangeability,  the  operator  should  not  only  consider  the  calibration 
indicators  to  be  equally  distributed  a  priori,  but  he  should  also  change  his  beliefs  on  them  in  the 
same  way  given  equal  evidence.  An  incomplete  and  imprecise,  but  very  suggestive  way  of  saying 
the  same  is  that  the  operator  should  be  just  as  sure  or  confident  about  his  beliefs  in  all  the  calibra¬ 
tion  indicators. 

However,  just  as  confident  does  not  say  how  confident,  like  equally  distributed  does  not  say  how 


7.  Actually,  (4.2)  is  a  special  case  of  (4.3). 


they  are  distributed.  Therefore,  we  need  an  assumption  that  sets  the  level  of  confidence  in  the 
calibration  indicators  in  the  same  spirit  that  11  sets  the  distribution  of  the  calibration  indicators. 
In  other  words,  we  have  to  extend  11  to  take  care  of  all  the  probability  dynamics.  Given  the  struc¬ 
ture  imposed  by  12  this  is  feasible  and  will  be  done  by  the  following,  and  final  assumption,  13. 


4.3  13:  Confidence  in  calibration  indicators 

Since  it  is  usually  easier  to  identify  extremes,  let  us  begin  by  investigating  the  two  extreme  cases 
in  which  the  operator  is  totally  confident  and  the  least  confident  about  his  prior  on  the  calibration 
indicators. 

If  the  operator  is  totally  sure  about  the  prior  as  postulated  in  11,  he  will  not  be  willing  to  change 
his  beliefs  no  matter  what  calibration  data  are  available.  Mathematically  we  can  say  that 

pUI»)  =  pU)  (4-4) 

for  all  y.  In  other  words,  such  an  operator  is  unwilling  to  learn  from  experience  since  he  considers 
he  knows  at  all  anyway  a  priori.  Calibration  data  are  completely  useless  for  him  and  so  would 
consequently  be  this  research. 

However,  when  an  operator  seeks  advice  he  is  typically  not  at  all  sure  about  his  beliefs  on  the  vari¬ 
able.  The  other  extreme,  i.e.,  wherein  the  operator  is  as  unconfident  as  Bayesianly  possible  about 
his  beliefs,  would  therefore  be  much  more  practical.  An  operator  could  reason  that  he  is  willing  to 
accept  the  prior  in  II,  but  that  he  will  want  to  change  it  as  quickly  as  possible  on  incoming  evi¬ 
dence.  This  kind  of  assumption  has  the  appeal  of  giving  the  data  a  maximum  of  influence  on  the 
posterior  (and  the  prior  a  minimum). 

Of  course  there  are  many  shades  of  gray  between  the  two  extremes  cited  here.  To  establish  which 
level  of  confidence  in  the  prior  on  the  calibration  indicators  the  operator  really  wants  will  require 
input  from  him.  We  will  assume  here  that  the  operator  is  indeed  as  unconfident  as  possible  in  the 
prior.  The  formal  statement  of  this  assumption  relies  upon  de  Finetti’s  theorem  and  is  presented 
in  appendix  4,  which  also  contains  a  discussion  on  how  to  proceed  when  the  operator  is  not  willing 
to  accept  13. 


5.  POSTERIOR  ON  THE  CALIBRATION  INDICATOR 


The  three  form  assumptions  and  the  three  inference  assumption  stated  above  are  sufficient  to 
determine  a  unique  posterior  distribution  over  the  bins  of  x0,  i.e. ,  over  z0.  For  the  details  of  the 
derivation,  the  interested  reader  is  referred  to  appendix  4.  Here  we  shall  merely  list  an  approxi¬ 
mate  result  for  the  case  wherein  p(*o)  ^  -5,  which  is  the  case  in  the  experiment.  Where  20  denotes 
the  true  value  of  x0  falling  in  bin  j,  we  find  that 

p(*ob)= — —  (5.1) 

"+7uo 

Conditioning  on  the  information,  a,  ...  ,k  to  pass  to  F(z0  |a,...,Jb,  y),  can  be  achieved  in  exactly 
the  way  as  was  described  under  11,  when  the  objective  was  to  pass  from  F(z0)  to  F(z0  |a,  .  .  .  ,k), 
by  throwing  out  the  impossible  bins  and  renormalizing.  See  appendix  4  for  the  details. 

Alternatively,  we  could  have  taken  F(z0  |a,  .  .  .  ,  k,  y),  as  a  starting  point,  as  follows 

p(z0|a,  .  .  .  ,k,  y)  =  - -  (5.2) 

n%(*ol<*.  •••,*) 

Thus,  the  posterior  on  a  particular  bin  is  a  function  of  the  prior,  the  count  of  hits  in  the  bin  and 
the  total  numbers  of  counts  in  the  existing  bins  for  that  indicator.  The  prior  is  determined  by  II 
and  the  claims  of  the  sources.  The  y fs  and  n  are  determined  by  the  outcomes  of  the  previous 
calibration  indicators  together  with  12.  They  are  listed  underneath  the  graphs  in  appendices  2  and 
3.  Thus,  the  posterior  on  the  calibration  indicator  for  the  decision  variable  is  uniquely  determined 
by  (5.1)  or  (5.2). 

It  is  interesting  to  note  that  in  (5.1),  as  n  grows  large  and  as  long  as  p(20)^0»  the  posterior  on  the 
bins  approaches  the  ratio  p;/n,  as  conjectured  in  chapter  3.3.  Thus,  in  the  long  run,  the 
operator’s  posterior  beliefs  will  indeed  converge  to  the  hit  rates  of  the  bins.  For  the  finite  case  (i.e. 
all  practical  cases),  however,  the  posterior  is  determined  by  both  the  prior  and  the  calibration 
data,  and  the  hit  rates  alone  are  insufficient.  The  solution  extends  the  intuition  in  that  the  solu¬ 
tion  to  the  finite  cases  (  n  <oo  )  is  also  given. 

Appendix  2  shows  the  posterior  on  the  calibration  indicator  of  several  decidion  variables.  On  the 
horizontal  axis  the  priors,  F(z0|a),  F(z0|&)  or  F(z0|a,6),  on  the  calibration  indicators  are  plot¬ 
ted.  On  the  vertical  axis  the  posterior  on  the  calibration  indicators  is  shown.  If  the  source  is  well 
calibrated,  these  should  be  equal  for  all  variables;  the  points  should  all  lie  on  the  line  making  a  45 
degree  angle  with  both  axes.  Of  course,  on  the  first  variable  this  is,  by  assumption,  the  case  since 
there  are  no  calibration  data  yet  to  disprove  the  source.  Thus  variables  A-l,  B-l  and  AB-1  in 
appendix  2  all  display  this  phenomenon.  The  values  are  indicated  by  little  boxes. 

At  variable  A-2  we  see  that  the  true  value  of  the  first  variable  fell  in  A’s  zeroth  bin  (as  can  also  be 
verified  in  appendix  3A-I)  so  the  posterior  on  the  calibration  indicator  of  A-2  is  adjusted  to  make 
the  zeroth  bin  somewhat  more  probable  and  the  remaining  bins  proportionally  less  probable.  At 
variable  A-5,  some  more  evidence  has  arrived  that  A  overestimates  (true  values  falling  in  zeroth 
bin)  and  the  plot  already  takes  on  the  typical  shape  of  an  overconfident  source  (too  many  hits  fal¬ 
ling  in  the  extreme  bins).  At  A-10  the  source  is  just  overconfident  and  at  A-20  it  looks  as  an 
overconfident  underestimator.  At  A-53,  the  last  variable,  it  is  (provisionally)  concluded  that 
source  A  is  somewhat  overconfident  and  tend  to  underestimate  the  quantities  in  the  experiment. 
The  way  the  posterior  says  this  is  by  lending  more  probability  to  the  zeroth  and  third  bin  and,  of 
these  two,  the  most  to  the  third  bin. 
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One  can  see  that  initially  the  posterior  on  the  calibration  indicator  is  very  sensitive  to  incoming 
calibration  data.  This  is  due  to  13  which  postulates  that  the  operator  is  as  unconfident  as  possible 
within  the  constraints  of  12  about  his  prior  beliefs  as  in  II.  As  he  accumulates  calibration  data, 
however,  the  operator  becomes  more  and  more  convinced  of  the  characteristics  of  his  source  so  at, 
say,  variable  A-40  his  beliefs  are  significantly  less  sensitive  to  incoming  calibration  data.  A  previ¬ 
ously  performed  Monte  Carlo  simulation  shows  that  the  posterior  on  the  calibration  indicators 
converges  to  its  limit  for  all  practical  purposes  after  about  500  trials. 

Also  shown  in  appendix  2  are  the  calibration  results  for  source  B.  It  so  happened  that  source  B 
also  had  the  true  value  of  the  first  variable  fall  in  its  zeroth  bin,  so  plot  B-l  is  equal  to  A-l.  After 
fifty-three  variables  B  shows  significantly  more  overconfidence  than  A  and  also  a  strong  tendency 
to  underestimate.  Both  these  characteristics  are  somewhat  expected  for  our  human  sources. 
Overconfidence  is  a  general  bias  that  people  have.  Since  the  questions  mostly  involved  records,  one 
can  conjecture  that  these  tend  to  be  unexpectedly  high  in  general  so  that  people  tend  to  underesti¬ 
mate  records.  Comparing  B-53  to  A-53,  we  can  say  that  source  A  is  better  calibrated  than  B. 
However,  this  does  not  necessarily  imply  that  A  is  also  more  informative  than  B,  either  a  priori  or 
a  posteriori,  since  being  informative  is  also  dependent  on  the  location  of  the  source’s  fractiles. 
This  will  be  the  subject  of  chapter  6. 

Appendix  2AB  contains  the  calibration  plots  for  the  two-source  case.  In  general  these  contain 
seven  data  points.  However,  sometimes  A  and  B  each  have  a  fractile  at  the  same  location,  in 
which  case  there  are  only  six  distinct  ones  (e.g.  on  variable  10).  Variable  AB-1  is,  of  course,  the 
familiar  straight  line  under  45  degrees.  The  subsequent  calibration  plots  show  a  similar  kind  of 
behavior  as  in  the  results  of  A  and  B  alone. 

For  readers  familiar  with  standard  calibration  plots  it  should  be  noted  that  there  are  two  impor¬ 
tant  differences  between  those  and  the  plots  in  appendix  2.  First  of  all,  the  results  in  appendix  2 
are  valid  for  all  variables,  i.e.,  no  matter  how  large  n,  the  number  of  available  calibration  indica¬ 
tors,  is.  Standard  calibration  plots  are  only  valid  for  large  n;  strictly  speaking  only  for  n=oo. 
Secondly  the  results  for  the  two-source  case  are  novel  in  that  they  are  not  obtained  by  simply 
adding  the  counts  of  the  two  sources  for  each  bin.  They  are  based  on  counts  of  all  possible  cross¬ 
bins  and  they  measure  calibration  in  the  one-source  sense  plus  the  interdependence  of  the  two 
sources.  For  these  readers  it  may  also  be  interesting  to  realize  that  the  standard  definition  of  cali¬ 
bration  in  the  literature  entails  12,  that  is,  exchangeability  of  the  calibration  indicators.  Indeed 
any  definition  that  considers  hit  rates  to  determine,  in  the  long  run,  calibration  considers  y  to  be  a 
sufficient  statistic  which  is  equivalent  to  assuming  exchangeability.  The  consequences  of 
exchangeability  are  surprisingly  strong  though.  Under  II  we  discussed  that  the  operator  need  not 
only  consider  his  marginal  priors  on  the  indicators  to  be  equal,  but  change  them  equally  too.  This 
is  something  that  is  insufficiently  verified  in  the  empirical  psychological  literature  on  calibration 
(see  e.g.  Lichtenstein  et  al.  [1980]  for  a  review  to  that  date).  As  an  extreme  example,  consider  an 
information  source  that  cheats  in  the  following  way.  On  fj  it  makes  its  zeroth  bin  ridiculously 
large  by  choosing  to  be  extremely  large.  Obviously,  everyone  expects  the  true  value  to  fall  in 
the  zeroth  bin.  On  x2  this  source  could  stretch  out  another  bin  of  its  choosing.  By  dosing  things 
right  it  could  thus  attain  any  calibration  score  it  wants,  including  perfect  calibration.  The  reason 
all  arguments  break  down,  though,  is  that  12  is  violated.  The  operator’s  priors  are  not  equal  over 
the  calibration  indicators  and  thus  they  are  not  exchangeable.  The  violations  of  12  can  be  more 
subtle.  For  instance,  suppose  the  operator  has  the  same  prior  on  z0  and  the  ij,  .  .  .  ,zn,  and  he  is 
very  unsure  about  these  beliefs  on  the  z\,  .  .  .  ,  zn  but  very  sure  on  his  beliefs  on  f0  (e.g  .  this 
involves  the  outcome  of  some  die  which  he  has  rolled  a  very  large  number  of  times).  Now  the  cali¬ 
bration  of  the  source  on  zlf  .  .  .  ,r„,  especially  if  n  is  small,  is  not  going  to  influence  the  operator 
opinions  on  z0.  This  would  have  been  completely  different  if  z0  was  exchangeable  with  z{,  .  .  .  ,  c,, 
i.e.,  if  he  was  just  as  unconfident  about  his  beliefs  in  c0. 

It  would  seem  that  strict  exchangeability  is  unattainable  in  practice  and  that  calibration  of  experts 


and  this  decision  aiding  system  is  practically  meaningless.  We  do  not  think  that  this  is  the  right 
conclusion.  The  moral  is,  rather,  that  one  has  to  be  very  careful  in  selecting  a  sequence  .  .  .  ,c„ 
of  calibration  indicators  and  to  try  to  approach  the  ideal  as  much  as  possible,  or  at  least  deviate 
from  it  in  a  controlled  and  informed  way.  To  come  back  to  the  Newtonian  analog}',  the  fact  that 
an  inertial  reference  frame  is  unattainable  in  practice  does  not  make  it  a  useless  concept  or  often  a 
practically  viable  assumption. 
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6.  POSTERIOR  ON  THE  DECISION  VARIABLE 


The  calibration  indicators  are  merely  derived  random  variables  designed  to  assist  in  the  analysis. 
As  stated  in  (3.1)  through  (3.3),  our  fundamental  interest  is  in  the  posterior  beliefs  on  the  decision 
variable.  In  the  experiment  every  variable  has  its  turn  as  a  decision  variable,  with  the  previous 
variables  acting  as  calibration  indicators. 

The  posterior  on  the  calibration  indicator  of  the  decision  variable  in  question  constrains  the  possi¬ 
ble  posteriors  on  the  variable  since  they  prescribe  the  probability  mass  of  each  bin.  However,  they 
do  not  determine  a  unique  posterior  distribution  on  the  decision  variable.  The  way  we  obtain  a 
posterior  on  the  decision  variable  is  to  invoke  F3  again.  For  the  present  purposes  F3  implies  that 
the  operator  wishes  to  change  his  opinions  not  any  more  than  is  implied  by  the  calibration  data. 
One  way  to  achieve  this  mathematically  is  to  maximize  the  entropy  of  the  posterior  with  respect 
to  the  prior  of  the  decision  variable,  under  the  constraints  imposed  by  the  posterior  on  the  calibra¬ 
tion  indicator  of  that  variable. 


Since  the  exact  shape  of  the  prior  is  rather  arbitrarily  determined  in  II  (uniform  over  the  bins), 
the  posterior  will  be  just  as  arbitrary  (but  will  not  introduce  extra  arbitrariness).  It  can  be  shown 
that,  in  our  case,  the  posterior  is  also  uniform  over  the  bins.  Appendix  3  shows,  in  solid  lines,  the 
posterior  distributions  for  the  fifty-three  variables.  The  posteriors  to  the  information  from  source 
A  alone  are  shown  in  appendix  3A.  At  variable  A-l  the  prior  is  equal  to  the  posterior;  the  source, 
by  lack  of  calibration  data,  has  the  benefit  of  the  doubt.  The  true  value  is  marked  on  the  horizon¬ 
tal  axis  by  a  cross.  Its  value  is  listed  in  the  legend  of  the  graph,  which  also  lists  the  n  (which  is  of 
course  zero  since  there  are  no  calibration  results  yet),  the  counts  of  the  four  bins  (which  are  also 
zero)  and  the  values  of  A’s  three  fractiles  on  the  variable.  It  can  be  seen  that  the  true  value  fell  in 
the  zeroth  bin.  This  is  taken  into  account  in  the  posterior  on  variable  A-2.  Now  n=l,  y0=l  and 
the  probability  of  the  zeroth  bin  is  higher  and  the  remaining  bins  proportionally  lower,  uniformly 
distributed  between  the  fractiles.  These  results  compare  to  the  results  pertaining  to  the  same  vari¬ 
able  in  appendix  2A,  in  which  the  bin  distributions  were  plotted. 


In  3A-2  again  the  true  value  falls  in  the  zeroth  bin.  In  variable  A-3  we  find  indeed  that  the  proba¬ 
bility  of  the  zeroth  bin  has  increased  even  more.  The  general  effect  is  that  probability  mass  is 
slowly  being  shifted  to  the  left  to  compensate  for  an  at  this  point  seemingly  appearing  overestima¬ 
tion  tendencies  of  source  A .  In  the  legend  of  variable  A-3  we  see  that,  thanks  to  12,  hits  are  sim¬ 
ply  added,  resulting  in  a  y0  °I  2.  This  implies  that  all  previous  calibration  indicators  count  just  as 
much  in  the  determination  of  the  posterior.  The  effect  of  a  previous  hit  never  disappears, 
although  it  eventually  gets  drowned  in  an  overabundance  of  other  calibration  indicators.  Subse¬ 
quent  outcomes  show  that  overestimation  is  not  really  typical  of  source  A.  Already  at,  say  vari¬ 
able  A-I2,  there  is  a  significant  amount  of  true  values  in  the  third  bin  accompanied  by  a  shift  of 
probability  mass  to  the  right  in  an  effort  to  compensate  for  the  now  apparent  underestimating  ten¬ 
dencies  of  source  A.  It  is  also  clear  that  probability  mass  is  spread  out  more;  the  system  is  trying 
to  compensate  for  A’s  overconfidence  as  evidenced  by  the  relatively  high  counts  of  true  values  in 
the  extreme  bins.  As  we  go  along  the  variables  we  see,  as  we  also  noted  in  the  previous  chapter, 
that  this  tendency  persists. 


The  results  for  source  B  are  shown  in  appendix  3B.  As  can  be  seen  and  as  was  already  noted  in 
chapter  5,  they  are  in  the  same  comparable  to  those  in  appendix  2A,  except  that  B  is  quite  a  bit 
more  overconfident  and  a  much  stronger  underestimator  than  A.  Note  that  the  actual  informa¬ 
tion  a  source  provides  in  the  posterior  is  a  function  of  both  the  calibration  data  and  the  location  of 
the  fractiles  of  a  source.  Although  the  calibration  on  variables  A-41  and  A-42  is  hardly  different, 
A  is  much  more  informative  (with  respect  to  e.g.  B)  on  variable  41  than  42.  Since  his  fractiles 
were  very  close  to  begin  with,  he  remained  informative  in  spite  of  a  considerable  "flattening"  of  his 
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density. 


Finally,  in  appendix  3AB  the  posteriors  are  shown  for  the  two-source  case.  Several  remarks  can  be 
made.  Firstly,  the  densities  tend  to  be  much  more  "peaked"  than  either  of  the  single-source  cases. 
Thus,  the  combination  of  two  sources  is  more  informative  than  any  single  source,  as  one  might 
hope.  Secondly,  since  we  have  six  instead  of  three  fractiles  to  work  with,  the  densities  have  more 
degrees  of  freedom,  for  instance,  in  this  case  the  densities  can  be  bimodal.  Variable  AJB-12  is  a 
good  example  in  which  the  sources  apparently  disagreed  strongly  on  the  location  of  the  main  pro¬ 
bability  mass.  In  other  cases  it  arises  due  to  a  combination  of  A’s  and  B’s  fractiles,  like  in  vari¬ 
ables  AB-14,  -16,  -42,  -51  and  others. 
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7.  EVALUATION  OF  THE  DECISION  AID 


Of  course  these  compensating  tendencies  of  the  posterior  densities  on  the  decision  variables  are 
highly  interesting  in  themselves,  but  they  do  not  answer  the  ultimate  question:  is  the  posterior 
better  than  the  prior,  or,  in  less  technical  terms:  is  the  corrected  information  with  this  aid  any 
better  than  just  taking  the  sources’  word? 

To  ask  whether  one  probability  distributions  is  better  than  another  pur  see  does  not  make  much 
sense.  What  does  make  sense  is  to  ask,  a  posteriori,  whether  a  one  decision  or  control  action  the 
operator  took  was  better  than  another.  To  pursue  this  line  thought,  we  have  to  structure  the  deci¬ 
sion  problem  a  little.  To  this  end,  let  us  suppose  that  the  action  the  operator  has  to  undertake  is 
the  choice  of  a  value  of  the  variable,  that  is,  for  all  fifty-three  variables  he  has  to  provide  an  esti¬ 
mate  of  its  value.  To  do  to  this,  the  operator  not  only  needs  a  distribution  over  the  variable 
representing  his  knowledge,  but  also  a  loss  function  over  the  possible  consequences  representing  his 
goals.  In  an  estimation  problem  the  consequences  consist  of  misestimating  the  true  value  more  or 
less  severely.  What  will  be  considered  more  or  less  severe  will  be  dictated  by  the  particulars  of  his 
task.  Often  it  will  be  just  as  bad  to  overestimate  as  to  underestimate  in  which  case  the  loss  func¬ 
tion  is  symmetrical  around  the  true  value.  In  other  cases  this  will  not  be  true;  William  Tell’s  loss 
function  was  probably  not  symmetric. 

Therefore,  to  show  rigorously  that  the  decision  aid  is  useful  to  the  operator,  we  must  show  that, 
no  matter  what  the  operator’s  goals  or  values  are,  he  is  better  off,  in  the  long,  using  the  output  of 
the  aid  rather  than  listening  to  the  sources.  A  trifle  more  technically  said,  we  must  show  that  no 
matter  which  loss  function  the  operator  employs,  he  accrues  less  (or  at  most  equal)  losses  in  the 
long  run  using  the  posterior  than  he  does  using  the  prior,  no  matter  what  his  loss  function  is. 

Since  there  is  an  infinity  of  possible  loss  functions  it  is  an  impossible  task  to  show  this  empirically. 
What  we  can  and  will  do  is  to  select  a  family  of  typical  loss  functions  and  show  that  the  aid  per¬ 
forms  better  under  these.  We  have  chosen  for  a  family  of  so-called  bilinear  loss  functions  which 
are  sketched  in  figures  7.5-i  to  7.5-iv.  A  bilinear  loss  function  yields  zero  loss  if  the  estimate  is 
equal  to  the  true  value  (t.v.  in  figure  7.5)  of  the  variable  and  is  linearly  increasing  as  the  estimate 
goes  away  from  the  true  value.  However,  the  right  slope  does  not  have  to  be  equal  to  the  left 
slope.  The  loss  function  in  7.5-i  has  a  left  slope  which  is  nine  times  smaller  than  the  right  slope. 
This  loss  function  strongly  punishes  overestimation;  if  the  estimates  based  on  the  distribution  are 
higher  than  the  true  value,  more  loss  is  accrued  than  when  it  is  the  same  distance  lower  than  the 
true  value.  Loss  function  7.5-ii  is  actually  symmetric,  punishing  overestimation  and  underestima¬ 
tion  equally.  The  remaining  two  loss  functions  punish  underestimation,  7.5-iv  in  the  same  amount 
as  5-i  punishes  underestimation. 

Let  l  be  the  left  slope  and  r  the  right  slope.  It  is  well  known  (see  e.g.  Berger  [1985])  that  the 
optimal  or  Bayesian  estimate  when  using  a  bilinear  loss  function  is  an  //(/+r)-fractile  of  the 
operator’s  distribution.  Thus,  the  Bayesian  estimate  when  using  the  loss  function  7.5-i  is  a  .1- 
fractile  of  the  operator’s  distribution.  The  symmetric  loss  function  implies  that  a  .5-fractile  or  the 
median  of  the  posterior  is  the  Bayesian  estimate  and  so  on.  Both  the  quotient  of  the  slopes  and 
the  estimation  fractile  are  given  in  the  legend  of  each  loss  function.  This  fact  points  out  another 
advantage  of  the  use  of  the  bilinear  loss  function  to  evaluate  the  aid.  One  arbitrary  factor  in  the 
graphs  of  the  densities  in  appendix  3  are  the  bounds  at  the  extremes  of  the  densities,  which  were 
chosen  by  us.  A  fractile  is  insensitive  to  the  particular  choice  of  these  bounds.  Had  we  chosen  for 
instance  a  quadratic  loss  function,  then  the  optimal  estimate  is  the  mean  which  is  certainly  not 
insensitive  to  the  choice  of  these  bounds. 

In  order  to  compare  the  prior  with  the  posterior,  we  can  now  calculate  the  Bayesian  estimate  of 
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each  decision  variable  based  on  the  prior  and  on  the  posterior,  this  being  the  appropriate  fractile 
of  these  distributions.  Based  on  the  true  value  and  the  estimate  we  can  calculate  the  losses  the 
operator  accrues  for  both  the  prior  and  the  posterior  distribution.  In  the  long  run,  the  losses 
accrued  while  using  the  posterior  should  be  less  than  those  accrued  wile  using  the  prior  for  all  four 
loss  functions. 

In  figures  7.1-i  through  7.1-iv  we  show  the  cumulative  losses  for  the  four  different  loss  functions 
when  the  operator  only  uses  source  A.  The  prior  losses  are  shown  by  a  dashed  curve  and  the  pos¬ 
terior  losses  by  a  solid  curve.  At  the  zeroth  variable,  that  is  before  any  estimation  has  been  done, 
the  losses  are  of  course  zero.  After  the  first  variable,  the  losses  for  the  prior  are  equal  to  those  for 
the  posterior  since  prior  and  posterior  are  equal  on  this  variable.  At  the  second  variable  they 
differ.  In  figure  7.1-i,  the  prior  accumulates  much  higher  losses  than  the  posterior.  Evidently,  the 
shift  of  probability  mass  to  the  left  was  a  fortunate  thing,  which  figures  since  variable  A-2  also 
had  the  true  value  falling  in  the  zeroth  bin.  Then  the  loss  accrued  on  estimating  the  third  variable 
is  added  to  the  losses  already  accrued  and  so  on  until  the  fifty-third  variable  at  which  both  the 
total  losses  are  almost  equal.  If  we  had  more  variables  we  could  see  whether  this  latter 
phenomenon  is  just  a  statistical  fluctuation  and  that  the  two  curves  ultimately  diverge  with  the 
dashed  line  the  upper  branch,  as  they  should.  On  the  whole  the  posterior  losses  lie  below  the  prior 
losses.  Thus,  with  this  particular  loss  function,  the  operator  is  at  any  point  better  of  using  the  aid 
than  listening  to  the  source,  although  the  difference  is  not  spectacular.  The  difference  is  even  less 
spectacular  when  using  the  symmetric  loss  function,  as  evidenced  by  figure  7.1-ii.  Large  gains 
occur,  however,  when  using  the  loss  function  which  punishes  underestimation  as  evidenced  by 
figure  7.1-iv. 

The  reason  that  the  improvement  is  small  in  the  first  three  cases  is  that  the  first  three  loss  func¬ 
tion  are  sensitive  to  biases  that  the  sources  do  not  display.  If  source  A  is  not  an  overestimator, 
then  the  aid  cannot  and  should  not  correct  for  overestimation.  Consequently  the  difference 
between  the  corrected  and  uncorrected  versions  are  not  spectacular  when  using  a  loss  function  that 
concentrates  on  overestimation.  If  the  source’s  main  bias  is  overconfidence,  then  a  symmetric  loss 
function  will  not  bear  these  corrections  out  since  median  does  not  change  if  the  density  is 
"flattened”.  Of  course,  since  A  is  a  also  an  underestimator,  there  are  improvements  going  on 
However,  it  will  take  much  more  than  fifty-three  variables  to  see  the  two  curves  diverging 
significantly  since  the  rate  of  divergence  is  so  much  smaller.  What  is  important  is  that  the  aid  did 
not  do  worse  on  neither  of  the  loss  functions,  so  one  is  never  worse  off  using  the  aid. 

Since  B  is  quite  a  bit  worse  calibrated  than  A,  we  expect  the  gains  with  respect  to  his  claims  to  be 
relatively  higher  then  in  A’s  case.  A  comparison  between  figure  7.1  and  7.2  shows  that  this  is 
indeed  so.  Since  B  is  very  overconfident  and  quite  an  underestimator,  both  figure  7.2-i  and  7.2-iv 
show  strong  divergence,  especially  so  in  7.2-iv,  the  case  in  which  underestimation  is  punished.  The 
reason  that  7 ,2-i’s  prior  losses  look  as  if  they  proceed  in  steps  is  that  the  aid  is  trying  to  shrink  far 
to  the  left  to  avoid  any  overestimation,  since  this  is  punished  heavily  with  that  particular  loss 
function.  By  doing  so  it  might,  on  the  average,  be  more  often  further  away  from  the  true  value, 
but  it  avoids  the  extremely  costly  risk  of  overestimating  the  variable.  The  prior  does  this  a  couple 
of  times  (the  steps)  and,  in  the  long  run,  ends  up  losing  a  lot  more. 

Figures  7.3-i  through  7.3-iv  shows  the  losses  for  the  two-adviser  case.  Again,  we  find  overall 
improvement  with  the  most  significant  occurring  again  when  the  loss  function  punishing  underesti¬ 
mation  is  used  -figure  7.3-iv.  Since  the  prior  in  the  double-source  case  is  different  from  the  priors 
in  the  single-source  cases,  not  much  can  be  said  about  the  comparison  of  the  relative  improve¬ 
ments  with  respect  to  the  prior.  What  can  be  meaningfully  compared  are  the  posterior  losses  for 
the  three  cases.  If  combining  information  with  this  aiding  system  is  to  be  useful,  then  the  poste¬ 
rior  losses  in  the  two-source  case  should  be  lower  than  any  of  the  single-source  cases.  In  figure  7  4 
these  losses  are  shown  for  the  three  cases  by  the  indicated  lines  for  the  same  four  loss  functions 
On  the  first  three,  the  combination  is  nearly  always  better  than  any  single  source  Figure  7  l-iv  i- 
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an  exception.  It  seems  that  the  discrepancy,  however,  is  due  to  a  number  of  "bad"  variables  in  the 
beginning,  and  that  the  system  is  trying  to  catch  up.  If  we  had  more  variables  we  could  show 
whether  this  conjecture  is  indeed  right.  This  highlights  a  problem  of  the  multi-source  case.  In  the 
two-source  case  we  have  sixteen  bins  instead  of  four.  This  means  that  we  have  quite  a  bit  less  cali¬ 
bration  data  available  per  bin  to  update  on.  Thus,  if  we  have  a  several  accidentally  misleading 
calibration  indicators  in  the  beginning,  it  will  take  a  lot  longer  for  these  to  "wash  out"  in  the 
incoming  calibration  results.  We  are  currently  studying  methods  to  overcome  these  problems  so 
that  the  aiding  system  can  also  be  used  when  the  number  of  sources  and  fractiles  is  large  and  n  is 
still  relatively  small. 


8.  SUMMARY  AND  CONCLUSIONS 
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A  computer  aiding  system  designed  to  evaluate  and  combine  sources  of  information  for  an  opera¬ 
tor  in  a  supervisory  control  situation  was  presented.  The  aid  is  based  on  six  assumptions,  categor¬ 
ized  into  three  form  assumptions  and  three  inference  assumption,  namely 


Fl.  There  exist  a  real  valued  random  variable  on  the  value  of  which  the  operator  wishes  to  base 
his  decisions,  which  he  in  turn  wishes  to  take  in  a  Bayesian  manner. 


F2.  The  sources  give  their  information  in  terms  of  fractiles  of  their  distribution  on  the  variable  in 
Fl. 


F3.  The  sources  can  be  completely  evaluated  by  a  sequence  of  calibration  indicators.  These  con¬ 
sist  of  a  registration  of  the  location  of  the  true  value  of  an  arbitrary  number  of  Fl-like  variables 
with  respect  to  the  F2-like  information  obtained  on  those  variables. 


11.  If  the  operator  has  no  evidence  concerning  the  characteristics  of  the  sources  (i.e.,  if  the 
sequence  in  F3  has  no  elements),  the  operator  starts  off  believing  the  sources. 


12.  The  operator  considers  the  order  in  which  the  calibration  indicators  appear  to  be  irrelevant, 
i.e.,  he  considers  calibration  indicators  to  be  exchangeable  random  variables. 


13.  The  operator  is  as  unsure  as  possible  about  assumption  11. 


These  six  assumption,  if  accepted  by  the  operator,  lead  to  an  unique  posterior  distribution  that 
represents  the  operator’s  beliefs  on  hearing  the  information  from  the  sources  and  in  light  their 
characteristics  as  information  sources.  The  assumptions  are  designed  to  model  the  typical  situa¬ 
tion  in  which  the  sources  are  much  more  knowledgeable  than  the  operator,  and  in  which  the 
operator  does  not  know  much  about  the  characteristics  of  the  sources  so  he  feels  comfortable  in 
relying  on  the  automated  procedure  of  measuring  their  characteristics  by  a  sequence  of  calibration 
indicators. 


.An  experiment  was  conducted  with  two  different  information  sources  advising  the  operator  on 
fifty-three  different  decision  variables.  On  every  variable  the  previous  variables  were  used  as  cali¬ 
bration  indicators  -their  true  values  were  considered  to  be  meanwhile  known. 


The  posterior  distributions  to  these  variables  -the  output  of  the  decision  aid-  were  compared  under 
different  loss  functions  to  a  policy  of  simply  believing  the  sources,  i.e.  the  prior.  The  aid  was 
found  to  perform  equal  or  better  on  every  loss  function.  The  aid  performs  equally  well  under  loss 
functions  that  do  not  bear  out  the  biases  of  the  source,  or  when  a  source  has  no  biases.  Otherwise 
improvements  are  found.  It  was  also  found  that  the  posterior  given  both  sources  was  generally 
superior  to  the  posteriors  under  any  one  source.  A  limiting  factor  in  this  evaluation  is  the  small 
number  of  decision  variables,  making  it  unclear  whether  there  were  ultimately  to  be  saving  in 
accumulated  loss  in  the  cases  in  which  the  improvements  were  relatively  small. 
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Figure  7.1:  Cumulative  losses  of  the  advice  (dashed)  and  corrected  advice  (solid)  of  source  A. 
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Figure  7.2-i 
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Figure  7.2-ii 


Source  —  B 

Left  slope  :  Right  slope  -  1  :  0.428571 
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Figure  7.2-iii 


Source  —  B 
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Figure  7.2-iv 


Figure  7.2:  Cumulative  losses  of  the  advice  (dashed)  and  corrected  advice  (solid)  of  source  B. 
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Figure  7.3-iv 

j  Figure  7.3:  Cumulative  losses  of  the  advice  (dashed)  and  corrected  advice  (solid)  of  sources  .4  and 

1  B  together. 
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Figure  7.4-i 
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Figure  7.4-iii 


Figure  7.4-ii 
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Figure  7.4-iv 


Figure  7.4:  Cumulative  losses  of  the  corrected  advice  of  sources  .4,  B  and  A  and  B  combined. 
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Figure  7.5-iii 


Figure  7.5:  Bilinear  loss  functions  with  various  slope  ratios 
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11.  Appendix  1 


(1)  The  Lamborghini  Countach  LP  500  S  holds  the  world  record  of  the  highest  tested  speed.  It  was 

kept  out  of  the  "red  zone”,  i.e.  at  below  7000  rpm.  What  was  this  speed? 

(2)  What  is  the  highest  speed  attained  by  a  rocket-powered  ice  sled? 

(3)  How  much  power  does  the  Grand  Coulee,  the  most  powerful  power  station  in  the  world, 

currently  produce’ 

(4)  Ford’s  15-cc  UF02  (Ultimate  Fuel  Optimiser)  holds  the  record  for  the  lowest  gasoline  con¬ 

sumption.  It  used  the  burn/coast  method  and  drove  between  13  and  22  mph.  What  was  this 
record  fuel  consumption? 

(5)  What  was  the  longest  recorded  distance  for  driving  on  2  wheels  in  a  regular  production  line 

car? 

(6)  The  longest  recorded  skidmarks  on  a  public  road  were  left  by  a  Jaguar  involved  in  an  accident 

on  the  Ml  (English  freeway)  near  Luton,  England.  Evidence  given  in  court  indicate  a  speed 
"in  excess  of  100  mph"  before  the  application  of  the  brakes.  How  long  were  these  skidmarks? 

(7)  The  fastest  circumnavigation  of  the  world  was  done  by  two  Canadians  in  a  Volvo  245.  How 

long  did  it  take  them  to  do  this? 

(8)  Two  people  from  Photovoltaic  Power  Systems  in  Calif,  attained  the  highest  speed  ever  for 

solar-powered  vehicles  with  their  "Sunrunner".  What  was  their  speed? 

(9)  Evidently  there  exist  a  practical  upper  limit  for  the  size  of  tires.  Goodyear  have  come  the 

closest  to  that  by  manufacturing  tires  for  giant  dump  trucks.  What  is  their  diameter? 

( 10)  how  much  do  they  weigh 

(11)  and  how  much  do  they  cost? 

(12)  What  is  the  world  speed  record  for  human  powered  (i.e.  one  human)  vehicles? 

(13)  What  is  the  tallest  unicycle  ever  mastered?  It  was  ridden  a  distance  of  376  ft? 

(14)  What  is  the  highest  speed  attained  by  a  railed  vehicle? 

(15)  The  longest  railroad  line  is  the  Trans-Siberian,  running  between  Moscow  and  Nakhodka.  How 

long  is  it? 

(16)  The  highest  recorded  takeoff  weight  of  any  aircraft  was  the  case  of  a  Boeing  747-200B  Jumbo 

jet  during  certification  tests  of  its  Pratt  &  Whitney  JT9D-7Q  engines.  How  much  was  that? 

(17)  The  French  hold  the  altitude  record  for  helicopters  set  by  a  315  B  Lama,  by  Aerospatiale  SA. 

What  is  that  height? 

(18)  Winzen  Research  Inc.  in  Minnesota  built  the  largest  balloon  ever.  What  was  its  volume? 

(19)  The  first  major  tidal  power  station  is  the  "Usine  Maremotrice  de  la  Ranee",  opened  in  1966  at 

the  Ranee  estuary  in  the  gulf  of  St.  Malo,  Brittany,  France.  It  has  a  net  annual  output  of 
544  million  kWhours.  How  long  is  its  barrage? 

(20)  and  how  many  turbo-alternaters  does  it  contain? 

(22)  California  boasts  the  world’s  largest  solar  power  plant.  How  much  does  it  produce? 

(21)  Babcock  &  Wilcox  Co.  designed  the  largest  boilers  ever.  How  many  lbs  of  steam  per  hours  are 

evaporated  in  one  such  a  boiler? 

(23)  The  largest  catalytic  cracker  is  the  Exxon’s  Baywater  Refinery  plant  at  Linden,  NJ.  What  is 

its  fresh  feed  rate  in  gallons  per  day? 


(24)  The  largest  nuts  ever  made,  the  so-called  "Pilgrim  Nuts",  manufactured  in  England,  were 

used  on  the  columns  of  a  large  forging  press.  What  was  their  diameter? 

(25)  and  what  was  their  thread? 

(26)  The  most  powerful  cranes  in  the  world  are  two  cranes  operated  by  the  dutch  company 

’Heerema’.  They  are  used  in  tandem  to  set  platforms  in  place  on  the  continental  shelf  off  the 
coast  of  Holland.  How  much  can  these  cranes  lift  (in  tandem)? 

(27)  The  fastest  printer  is  the  Radiation  Inc  electro-sensitive  system  at  the  Lawrence  Livermore 

Lab,  Calif.  The  speed  is  attained  by  controlling  electronic  pulses  through  chemically  impreg¬ 
nated  recording  paper  which  is  rapidly  moving  under  closed  fixed  styli.  How  many  lines  (120 
alphanumeric  characters)  can  it  type  per  minute? 

(28)  The  most  accurate  time  keeping  devices  are  the  twin  atomic  hydrogen  masers  installed  at  the 

US  Naval  Research  Lab.  They  are  based  on  the  frequency  of  the  hydrogen  atom’s  transition 
period.  What  accuracy  does  this  enable  them  to  have? 

(29)  The  Olsen  clockwork,  installed  in  the  Copenhagen  town  hail  is  the  most  accurate  mechanical 

clock.  How  accurate  is  this  clockwork? 

(30)  The  world’s  most  accurate  computer  to  date  is  the  CRAY-1,  designed  by  Seymour  R.  Cray  of 

Cray  research  inc.,  Minneapolis.  What  is  its  clockperiod? 

(31)  how  many  bytes  of  main  memory  does  it  have? 

(32)  and  how  many  floating  point  operations  per  second  does  it  attain? 

(33)  The  thinnest  wristwatch  ever  made  is,  of  course,  Swiss.  How  thick  is  it? 

(34)  The  highest  known  prime  number  was  discovered  on  the  previously  mentioned  Cray-1.  How 

high  is  it? 

(35)  Yasumasu  Kanada  of  Japan  has  calculated  jr  to  the  highest  number  of  decimal  places.  Part  of 

his  expansion  has  been  published  in  what  is  known  as  the  world’s  most  boring  800  pages. 
How  long  was  his  original  expansion?  (By  the  way,  the  most  inaccurate  version  was  made  by 
the  General  Assembly  of  Indiana,  who  enacted  in  House  Bill  #246  that  n  was  de  jure  4.) 

(36)  The  most  accurate  balance  is  the  Sartorius  Model  4108,  manufactured  in  Goettingen  (D).  To 

which  accuracy  can  it  weigh  objects  of  up  to  .5  grams? 

(37)  The  $13  million  Large  Optics  Diamond  Turning  Machine  at  Lawrence  Livermore  labs  can 

make  the  finest  cuts  in  the  world.  Estimate  how  many  times  it  was  able  to  sever  a  human 
hair  lengthwise. 

(38)  The  hottest  flame  that  can  be  produced  is  from  carbon  subnitrite  ).  Which  temperature 

is  it  calculated  to  reach  at  standard  conditions? 

(39)  The  Grand  Coulee  also  possesses  the  largest  hydraulic  turbines,  installed  by  Allis-Chalmers. 

At  how  many  megawatts  are  these  turbines  rated? 

(40)  The  lowest  coefficient  of  static  and  dynamic  (being  the  same  in  this  case)  friction  of  any  solid 

is  the  case  of  polytetrafluorethylene  (1C2F4],)  or  PTFE.  It  is  manufactured  by  du  Pont  and 
marketed  as  Teflon.  Estimate  this  coefficient. 

(41)  Both  the  strongest  and  the  weakest  magnetic  fields  were  achieved  at  the  Francis  Bitter  labs 

here  at  MIT.  The  weak  field  is  used  for  research  concerning  the  very  weak  magnetic  fields  ' 
generated  in  the  heart  and  brain.  What  is  the  strongest  field 

(42)  and  the  weakest? 

(43)  At  MIT  we  also  hold  the  record  for  highest  note  yet  attained.  Such  tones  are  produced  by  a 

laser  beam  striking  a  sapphire  crystal.  What  is  its  frequency? 
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(44)  The  highest  man-made  temperatures  occur  in  the  center  of  a  thermonuclear  fusion  bomb,  ami 

are  of  the  order  of  300-400  million  K.  The  highest  controllable  temperature  have  been 
achieved  at  Princeton’s  Plasma  Physics  Lab.  in  the  fusion  research  Princeton  Large  Torus 
The  lowest  temperature  was  reached  in  a  two-stage  nuclear  demagnetization  cryostat  at 
Espoo  in  Finland.  What  was  this  largest? 

(45)  and  lowest 

(46)  The  Naval  Research  Lab.  in  Washington  DC,  holds  the  record  for  the  highest  velocity  any 

solid  visible  object  has  ever  attained,  by  projecting  a  plastic  disc.  What  was  this  velocity? 

(47)  The  Eiffel  tower,  designed  by  Alexandre  Eiffel  for  the  Paris  exhibition,  has  a  length  of  985  ft 

11  in  tall.  Estimate  its  maximum  sway  in  high  winds. 

(48)  The  longest  bridge  span  is  the  main  span  of  the  Humber  Estuary  bridge  in  England,  at  4,626 

ft.  The  bridge’s  two  towers  are  533ft  15/8in  tall  from  datum  and  are  brought  out  of  parallel 
to  allow  for  the  curvature  of  the  earth.  Estimate  how  many  inches  they  are  brought  out  of 
parallel. 

(49)  The  greatest  of  the  roman  aquaducts  was  the  Aquaduct  of  Carthage  (now  in  Tunesia),  which 

ran  87.6  miles.  It  was  built  during  the  reign  of  Hadrianus  (117-138  AD).  How  much  water 
per  day  did  it  originally  supply  to  the  city  of  Carthage. 

(50)  In  1985,  at  MIT,  Thomas  Stockebrand  intended  to  prove  that  a  railway  system  could  be  built 

to  operate  at  velocities  greater  than  the  speed  of  sound.  Using  a  working  model  of  a  super¬ 
sonic  subway  tunnel,  before  world  leaders  in  the  fields,  pneumatically  pulled  a  table  tennis 
ball  through  950  ft  of  small  diameter  pipe.  At  which  speed  did  the  ball  go? 

(51)  The  tallest  unsupported  flagpole  stands  at  Chula  Vista,  Calif,  and  flies  the  Stars  and  Stripes. 

What  is  its  length  (above  the  ground)? 

(52)  The  fastest  speed  at  which  any  human  has  traveled  was  attained  by  the  crew  of  the  Apollo  X 

when  the  Command  Module  reached  its  maximum  speed  on  the  trans  earth  return  flight  at 
an  altitude  of  400,000  ft.  What  was  this  speed? 

(53)  The  record  ocean  descent  was  achieved  in  the  Challenger  Deep  of  the  Marianes  Trench,  250 

miles  southwest  of  Guam,  when  the  Swiss-built  US  Navy  Bathyscaphe  "Trieste"  reached  the 
ocean  bed.  At  what  depth  was  this? 
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13.  Appendix  4. 


13.1  Range  of  the  calibration  indicators 


Of  course  there  are  many  ways  to  represent  the  calibration  indicators  (the  z,’s)  numerically.  Since 
we  are  assuming  that  the  calibration  indicators  are  exchangeable,  we  will  be  primarily  interested 
in  the  number  of  occurrences  of  a  certain  value  in  the  sequence.  One  convenient  way  to  obtain 
these  counts  is  to  choose  the  indicators  such  that  the  counts  result  when  the  sequence  is  summed. 

In  the  single-source  case  the  following  scheme  works.  Let  m  be  the  number  of  fractiles  (yielding 
m+l  bins).  Suppose  the  true  value  fell  in  the  jltl  bin.  Then  let  the  calibration  indicators  take  on 
m-dimensional  row-vectors  such  that  all  elements  equal  zero  except  the  element  which  equals 
unity,  or 


=  [0, 


•  ,1,  • 

i 


■  ,0]. 


To  find  the  counts  of  the  hits  in  the  bins  we  can  define  a  random  variable  y  such  that 

y  =  E  a 


Obviously  the  count  of  the  j 

y=[yo,  •  ■.,»«] 


th 


bin  will  be  in  the  jth  position,  or 


(13.1) 


(13.2) 


(13.3) 


In  general,  if  we  have  k  sources  each  yielding  m  fractiles,  we  can  think  of  realizations  of  i,  as  k- 
dimensional  arrays,  each  dimension  -one  for  each  source-  consisting  of  an  m+l-dimensional  vector 
as  above.  With  two  adviser,  A  and  B,  each  giving  three  fractiles,  as  in  the  experiment,  we  have  a 
four  by  four  matrix.  One  possible  realization  or  outcome  could  be, 


z2  = 


0  0  0  0 
0  0  0  0 
10  0  0 
0  0  0  0 


(13.4) 


which  would  indicate  that  the  true  value  fell  in  As  zeroth  bin  and  B' s  second  bin.  In  the  experi¬ 
ment  this  was  indeed  the  case  at  the  second  variable  as  can  be  verified  e.g.  in  appendix  3AB. 

To  obtain  the  counts  we  can  still  apply  equation  (13.2),  this  time  obtaining  a  ^-dimensional  array. 
The  subscript,  j,  of  the  counts,  y,  can  now  be  thought  of  as  ifc-dimensional  vectors: 

In  the  experiment,  j  is  a  two-dimensional  vector  and  the  counts  are  of  course  as  arranged  as  fol¬ 
lows 
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13.2  Derivation  of  the  posterior  on  the  calibration  indicator 


In  this  section  we  will  derive  the  results  listed  in  section  7  under  equations  (5.1)  and  (5.2)  using 
only  the  three  form  assumptions  (FI,  F2  and  F3)  and  three  inference  assumptions  (11,  12,  13). 

According  to  the  form  assumptions,  we  can  write  the  posterior  on  the  calibration  indicator  as  fol¬ 
lows 

F(c0|a,  ,.’„).  (13.6) 

For  the  present  purposes  it  will  prove  to  be  more  convenient  to  work  with  probability  mass  func¬ 
tions,  or 

p(z0|a,  ...,*,  zlt  •  ■  .  ,zn).  (13.7) 

There  are  two  updatings  involved,  one  on  the  calibration  indicators  zt,  .  .  .  ,  z*  and  one  on  the 
information  a,  ...  ,k.  We  will  discuss  them  in  this  order. 

13.2.1  Conditioning  the  calibration  indicators 


By  the  definition  of  conditional  probability  we  have 


p(*ol  *u 


p(z0,  Zi,...  ,z%) 

P{z i»  ■•-.*.) 


(13.8) 


Since  the  z,’s  (i=0,  .  .  .  ,n)  are  exchangeable  (12)  the  denominator  of  the  quotient  can  be  expanded 
in  terms  of  a  (m+l)*-dimensional  parameter  array  6  with 


y-l-  (13.0) 

i 

De  Finetti’s  theorem  (see  e.g.  de  Finetti  [1964])  guarantees  the  existence  of  a  ((m  +  l)*-variate)  dis¬ 
tribution  F(9)  such  that1 

i 

j6j  •  /t»;*  •  dm 

p(*ol  *i,  •  •  •  ,*»)=- — t— i - : — .  (13.10) 

p(-r i.  •••»*•) 

where  k  runs  over  the  elements  of  0  (  (m+1)*  of  them),  and  r0,  as  usual,  represents  the  event  of 
the  true  value  falling  in  bin  j.  If  we  can  now  show  that  11  and  13  determine  the  form  of  f(0)  we 
will  be  done  in  principle. 

Filling  in  n=0  (and  therefore  y=0)  in  (13.10)  we  find,  after  some  straightforward  manipulation 

p(*o)  ~E(0).  (13.11) 

However,  the  left  side  of  (13.11)  is  simply  the  prior  on  the  calibration  indicator  which  is  deter¬ 
mined,  via  11,  by  the  claims  of  the  sources.  The  first  moment  of  f(8)  is  thus  determined.  In  gen¬ 
eral,  i.e.,  filling  y  j  =  n  in  (13.10)  we  find  that  for  the  relation  between  the  distribution  on  the  cali¬ 
bration  indicators  and  the  moments  of  f{8) 


I,  In  th«  Following  equation  1  and  0  represent  vectors  of  l's  and  0's,  respectively.  In  this  appendix  we  adhere  to  the 
convention  that  numbers  take  the  dimensions  of  the  variable  they  are  equaled  to.  E.g.  y  —  0  means  that  y  equals  an 
j  array  (of  the  dimensions  of  y )  made  entirely  of  leros. 

i 


p(z0,  .  .  .  ,*.)-£(••+>). 


(13.12) 


As  mentioned,  11  fixes  the  case  for  which  n=0,  leaving  still  an  infinite  number  of  moments  to  be 
assessed.2  This  will  be  done  by  13.  13  assumes  that  the  operator  is  as  unconfident  as  possible  about 
It.  VVe  shall  now  have  to  be  a  little  more  precise  about  this  assumption.  Before  we  determine 
what  as  unconfident  as  possible  means,  i.e.  under  the  constraint  imposed  by  (13.11),  we  shall  first 
fix  what  we  mean  by  unconstrained  unconfidence. 

VVe  shall  assume  that  the  least  confident  state  of  information  is  reflected  by  an  ((m  +  l)fc-variate) 
uniform  density  over  9.  This  assumption  is  quite  a  bit  less  arbitrary  than  it  looks  at  first  sight. 
Indeed  if  we  can  show  that  (1)  it  is  reasonable  to  assume  that  the  physical  property  6  represents 
should  indeed  be  uniformly  distributed  and  (2)  that  this  distribution  is  invariant  under  the  admiss- 
able  transformations  of  the  scale  of  6,  then  this  assumption  is  meaningful. 

(1)  The  proper  interpretation  of  9  is  that  it  represents  the  limiting  frequencies  of  the  hits  in  the 
bins,  or  the  limiting  hit  rates  (as  n— »oo).  Classical  statisticians  prefer  to  call  9  the  fixed  but  unk¬ 
nown  underlying  probabilities  of  a  hit  in  the  various  bins.  From  a  strict  Bayesian  point  of  view 
this  kind  of  wording  does  not  make  much  sense  and  is  even  a  bit  misleading.  Probabilities  are 
never  unknown  (in  the  sense  of  above)  since  they  can  be  derived,  at  least  in  principle,  from  a 
individual’s  preference  among  actions.  Secondly,  such  terminology  will  force  us  to  make  sense  of  a 
probability  of  a  probability  (equations  (13.10)  and  (13.11)  ).  Even  if  we  succeeded  in  proposing  an 
adequate  operational  definition  of  this  particular  conceptual  twister,  we  would  also  need  to  clarify 
the  meaning  of  a  probability  of  a  probability  of  a  probability  etc.  into  an  infinite  regress. 

The  interpretation  in  terms  of  limiting  hit  rates  can  be  made  very  compelling  by  investigating  the 
behavior  of  f(9  |  zlt  .  .  .  ,  zn)  as  n  grows  without  bound.  We  find  after  some  analysis  that,  under 
mild  regularity  conditions3 

lim  f(9  |z„  .  .  .  ,zn)  =  5v/a,  (13.13) 

fl— *O0 

where  the  notation  stands  for  the  unit  impulse  at  9=y/n.  Thus,  in  the  long  run,  the  operator 
will  become  convinced  that  the  parameters  of  the  process  equal  the  limiting  hit  rates.  It  should  be 
borne  in  mind  that  this  is  only  true  for  n  =  oo.  For  bounded  n  the  operator,  in  general,  has  a 
non-  degenerate  distribution  over  the  parameter  9,  expressing  his  uncertainty  concerning  the  value 
of  the  parameter.  The  parameter  parametrizes  all  possible  multinomial  models  that  might 
describe  the  multinomial  process  of  true  values  falling  in  a  set  of  bins.  Thus,  the  assumption  boils 
down  to  assuming  that  all  multinomial  models  describing  the  process  are  equally  likely. 

This  kind  of  assumption  seems  reasonable  and  is  almost  always  made  with  processes  like  tossing 
coins,  rolling  dice  and  similar  paradigms.  As  a  historical  precedence,  it  is  maybe  noteworthy  that 
Laplace,  in  his  famous  calculation  of  the  probability  of  the  sun  rising  tomorrow,  (implicitly)  made 
the  same  assumption,4  and  so  did  Bayes  himself  in  the  paper  that  gave  his  name  to  Bayesian 
statistics. 

(2)  As  for  the  invariance  of  the  distribution  of  9  under  the  admissible  transformation  of  its  scale,  it 
suffices  to  notice  that,  whether  one  wishes  to  interpret  the  parameter  as  a  limiting  frequency  or  as 

2.  Since  f(9)  is  concentrated  on  |0,l|  its  moments  uniquely  specify  its  distribution. 

3.  That  is  as  long  as  the  density  on  theta  is  never  zero. 

4.  His  result  was:  p(sun  rises  tomorrow  |  has  risen  on  y  out  of  n  days)  -  (y  +  1)  /  (n+2).  Compare  this  with  luation 
(13.18). 


an  underlying  "true"  probability,  it  is  measured  on  an  absolute  scale  so  there  are  no  admissible 
transformations  in  the  first  place.  (See  e.g.  Pfanzagl  [1968]  for  the  measurement  theoretic 
details.) 


Accepting  that  the  uniform  over  9  appropriately  reflects  the  unconstrained  least  confident  belief 
state,  we  will  proceed  to  determine  what  the  least  confident  belief  state  is  under  the  constraint 
imposed  by  equation  (13.11).  Roughly  speaking,  what  we  are  looking  for  is  a  distribution  on  the 
parameter  that  is  as  "close"  as  possible  to  the  uniform.  As  usual  we  will  measure  "closeness"  in 
distribution  space  with  the  relative  information  measure.  Precisely  speaking,  we  wish  to  find  the 
f(9)  that  minimizes 


/  /(tf)log 


0 


jm 

/«*(») 


de 


(13.14) 


under  the  first  moment  constraint  imposed  by  (13.11).  The  solution  to  this  problem  is  easily  esta¬ 
blished  with  the  calculus  of  variations.  One  finds  that,  on  [0,1], 

f(0)=— T  (13.15) 


1— e 


where  the  values  of  the  multiplier  X  follows  from  the  constraint  as  in  (13.11).  The  product  \8  is 
shorthand  for  XJ^j^j- 


In  the  single-source  single-fractile  case,  the  result  can  be  pictured,  since  a  single  9 (either  90  or  0,) 
is  sufficient  to  sepcify  the  distribution  on  9.  Now  this  "maximum  entropy"  distribution  is  a  renor¬ 
malized  exponential  with  its  tail  from  1  onward  clipped  off.  Figure  13.1  shows,  in  the  solid  lines, 
these  densities  for  a  number  of  values  of  the  expectation.  It  is  interesting  to  note  that  for  Z?(0)=.5, 
the  maximum  entropy  distribution  equals  the  uniform.  This  is,  of  course,  because  for  this  value  of 
the  expectation  (13.11)  does  not  effectively  constrain  the  minimization.  Furthermore,  as  one 
would  expect,  this  family  of  densities  is  symmetrical  around  the  uniform.  Compare  e  g.  the  den¬ 
sity  for  the  values  of  E(9)=. 7  with  .3. 


Theoretically  speaking,  the  problem  is  now  solved  since  we  just  need  to  plug  (13.15)  into  (13.10) 
and  to  carry  out  the  integration.  In  the  single-source,  single-fractile  case,  i.e.  the  cases  in  which  z, 
y  and  9  contain  two  elements,  the  resulting  integral  is  an  integral  form  of  Kummer’s  function.  No 
closed  form  solution  exists,  although  it  has  been  extensively  tabulated,  a  fact  which  is  of  limited 
use  for  computer  implementation  of  the  aiding  system.  In  all  other,  higher  dimensional  cases, 
numerical  solutions  are  the  only  alternative. 


To  speed  up  computations  for  computer  implementation  and  to  gain  physical  insight  in  the  solu¬ 
tion,  we  will  search  for  a  distribution  that  approximates  the  maximum  entropy  distribution  closely 
enough  for  practical  purposes  and  which  will  simplify  the  integral  in  (13.10).  If  }{0)  was  a 
((m+l)*-variate)  Dirichlet  distribution  with  parameter  r  which  is  a  by  now  usual  (m  +  1)*- 
dimensional  array  of  rf  s.  Customarily  one  also  defines  a  parameter  m  in  terms  of  r  as  follows 

m~  XJ  r  j.  (13.16) 

) 

In  other  words,  consider 

Hr-v) 

m  -  /*.(»)  -  „  *,-r  -  n1’,'  (i3.i7) 

i 

then  the  integral  in  (13.10)  simplifies  considerably.  Indeed  after  some  straightforward  manipula¬ 
tion  we  find  that  (we  also  replace  z0,  ...  ,z„  by  y)  for  the  true  value  of  z0  falling  in  bin  j. 
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PUo  |y)  =  — +  Tj  (13.18) 

n  +  m 

II  imposes  a  constraint  on  the  parameters.  Filling  in  y=n= 0  in  (13.18)  we  find 

P(:  o)  =  —  (13.19) 

m 

the  left  side  of  which  is  determined  by  11. 

The  fact  that  the  integral  solves  elegantly  using  a  Dirichlet  distribution  says  of  course  absolutely 
nothing  about  how  well  this  family  of  distributions  approximates  the  maximum  entropy  distribu¬ 
tions  which  follow  from  the  assumptions.  Fortunately,  the  family  of  Dirichlets  is  very  rich  and 
excellent  approximations  exist.  To  find  a  "best"  fit,  we  use  the  by  now  familiar  relative  entropy 
metric.  Of  course,  for  values  of  (m+l)*  higher  that  2,  the  fits  cannot  be  shown  in  two  dimension. 
In  the  two-dimensional  case  the  Dirichlets  become  Beta  distributions.  Figure  13.1  shows  the  Beta 
fits  (dashed  curves)  to  the  maximum  entropy  distributions  (normal  curves),  obtained  by  minimiz¬ 
ing  the  relative  entropy  of  the  Beta  distributions  with  respect  to  the  maximum  entropy  distribu¬ 
tion,  which  was  done  by  using  a  numerical  procedure  on  a  computer.  The  figure  shows  both  dis¬ 
tributions  for  a  several  values  of  the  expectation.  Note  that  for  E(ff )  =  .5  the  fit  is  perfect,  since 
the  uniform  is  a  member  of  both  families.  For  other  values  the  fit  is  not  perfect,  but  may,  without 
a  doubt  be  called  good  enough  for  our  purposes. 


Figure  13.1.  Maximum  entropy  densities  (solid)  and  the  approximating  beta 
densities  (dashed)  for  E(0)  =  .1,  .3,  .5,  and  .9. 


It  appears  from  the  numerical  results  that  for  p(r0)  r  *s  approximately  1.  For  p(r0)  >  5  we 
find  that  r  is  approximately  m— 1.  (This  can  be  also  be  verified  by  inspecting  the  functional  form 
of  the  Dirichlet  and  substituting.)  Filling  (13.19)  with  this  knowledge  into  (13.18)  yields  the  result 
quoted  in  the  document.  A  similar  result  can  be  obtained  for  p(r0)  >  5- 
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13.2.2  Conditioning  on  the  information 


F3  states  that  all  the  information  about  the  sources’  characteristics  is  contained  in  y.  From  this 
point  of  view  it  could  be  argued  that  F(x 0)  =  F(x0  \a,  .  .  .  ,k).  A  problem  is  that  we  cannot  argue 
the  same  for  the  calibration  indicators.  To  bear  out  the  problem,  we  write 

p(zQ\a,  .  .  .  ,k)  =  /  dF(x0\a,  .  .  .  ,k)  (13.20) 

binj 

To  simplify  matters  a  little,  assume  that  there  is  an  operator  for  whom  indeed 
F(i0)  =  F(z0  |a,...,Jfc).  It  follows  that 

p(2o  I  a> - *)  =  /  dF(x0)  (13.21) 

bin  i 

However,  when  the  information  has  been  obtained,  some  bins  simply  cannot  occur  anymore,  so 
there  is  no  reasonable  F(x0)  that  lends  positive  weight  to  these  bins.  In  the  single-source  case  this 
only  arises  if  the  source  reverses  the  order  of  its  fractiles,  and  we  could  make  a  good  case  for  sim¬ 
ply  rejected  the  information  from  such  an  incoherent  source.  However  in  the  multi-source  case 
this  arises  quite  naturally.  For  instance,  at  variable  number  ABl,  we  have  by  (13.21)  that 


p(2o= 


0  10  0 
0  0  0  0 
oooo 
0  0  0  0 


I  a,  t>)  =  0. 


(13.22) 


The  obvious  solution,  of  course,  is  to  use  the  conditioning  on  a,  .  .  .  , it  to  set  the  probabilities  of 
these  bins  equal  to  zero.  The  naive  approach  that  we  have  taken  is  to  simply  renormalize  the 
remaining  bins,  or,  equivalently,  to  assume  the  likelihood  function  to  be  constant  on  the  non-zero 
bins.  Objections  can  be  easily  raised  to  this  solution.  If  a  bin  is  very  small,  we  would  say  that  no 
F(x0)  should  assign  much  probability  mass  to  it.  However,  the  scales  of  the  variables  are,  in  gen¬ 
eral,  not  absolute,  so  a  small  bin  can  be  made  into  an  arbitrary  large  bin. 


The  proper  Bayesian  solution  is,  of  course,  to  assume  that  F(x0)  is  known,  and  derive 
F(z0\a,  .  .  .  ,k)  from  this  distribution.  This  poses  two  practical  problems.  Firstly,  it  is  very 
difficult  to  obtain  F(x0 )  (a  couple  of  fractiles  won’t  do).  At  any  rate,  most  operators  would  prefer 
11  to  performing  this  task  for  Xq  and  all  the  calibration  variables  i\,  .  .  .  ,xn.  Secondly,  even  if  we 
obtained  all  these  distributions,  we  would  have  to  hope  that  12  still  holds.  A  necessary  condition 
(not  even  sufficient)  is  that  for  all  k,  I  in  {0,  1, . },  and  j 

J  F(xk)dxk=  J  F(x,)dx, _  (13.23) 

bin]  bin] 

which  is  simply  to  good  to  be  true.  In  fact,  II  relies  on  the  operator’s  total  unconfidence  in  his 
uncertainty  (compare  13)  so  that  he  won’t  be  to  bothered  by  coherency  constraints  like  (13.23). 


Returning  to  our  problem  of  determining  the  posterior  on  the  calibration  indicator,  we  reason  the 
same  as  above,  that  is,  p(z0  |a,  .  .  .  ,fc,  y)  can  then  be  determined  from  p(zQ  |y)  by  setting  the  pro¬ 
babilities  of  the  impossible  bins  equal  to  zero  and  renormalizing.  Alternatively,  we  could  also  have 
conditioned  on  a,  ...  ,1  first.  In  this  case  (13.19)  becomes 

p(z0|o,  .  .  .  ,k)  =  -Lr,  (13.24) 

m 

where  the  primed  parameters  take  care  of  the  fact  that  certain  bins  have  probability  zero  (and  the 
rest  renormalized).  For  the  posterior  and  we  find  that 

p{z0\a,  .  .  .  ,k,  y)  = 

n  +  m 

which  obviously  equals  the  posterior  obtained  by  conditioning  in  the  reverse  order. 


(13.25) 


13.3  Extension  to  degree  of  confidence  in  prior 


Apart  from  being  mathematically  convenient  in  solving  the  inference  problem,  the  Dirichlet  also 
gives  an  insightful  reinterpretation  of  the  degree  of  confidence  in  the  prior  in  terms  of  "equivalent 
calibration  data”.  Suppose  our  prior  for  a  true  value  falling  in  the  zeroth  bin  is  .5  (=r0/m).  VVe 
find  the  posterior  after  we  have  previously  observed  y0  out  of  n  falling  in  the  zeroth  bin  to  be 

p(*o|a.y)=  — - - .  (13.26) 

n  +  m 

simply  a  special  case  of  (13.18). 

If  we  opt  for  being  the  least  confident,  we  find  that  r0=l  and  m=2.  Looking  at  the  above  equa¬ 
tion,  this  suggests  that  the  prior  can  we  viewed  as  representing  a  kind  of  "equivalent  calibration 
data",  namely  obtaining  one  hit  in  two  trials.  Within  the  constraints  of  the  Beta  distributions, 
these  are  the  lowest  values  for  the  parameters  that  satisfy  the  prior  constraint  (i.e.,  that  r0/m=.5). 
However,  there  are  much  larger  values  of  r0  and  m  that  still  satisfy  the  prior,  For  instance  r0=100 
and  m=200.  Of  course,  we  would  hardly  call  this  a  least  confident  prior  since  it  will  take  much 
more  "real”  calibration  data  now  to  change  the  posterior.  This  suggests  an  operational  way  of 
solving  the  problem  when  the  operator  does  not  feal  absolutely  unconfident  about  his  prior.  In 
this  case  we  will  have  to  elicit  an  "equivalent  calibration  sequence"  for  his  prior. 

Of  course  there  are  other  ways  to  elicit  the  necessary  information.  However,  if  we  do  not  assume 
something  about  the  form  of  the  distribution  of  9,  we  will  have  to  elicit  in  some  way  or  another  a 
(countable)  infinite  series  of  moments.  If  an  operator  agrees  that  some  family  of  distributions 
describes  his  confidence  accurately  enough,  and  that  family  has  some  finite  number  of  parameters, 
we  need  only  assess  a  finite  number  of  moments.6  Applications  of  the  theory  to  real  situations 
alone  will  show  if  such  polishing  of  the  theory  is  indeed  necessary. 


S.  The  beta  has  two  parameters  and  therefore  we  need  only  asses  two  moments.  The  prior  (— rJm)  constitutes  the  first 
and  the  second,  which  equals  (r0+l)/(m+l)  the  second. 
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14.  Appendix  5 


14.1  List  of  symbols 


A 

A,  .  .  .  ,K 

a 

B 

b 

F() 

/() 

Fl,  F2  and  F3 

t 

11.  12  and  13 

j 

K 

k 

k 

m 

n 

P{) 

r 

ri 

*o 

*0 

X< 

x'i 

y 

y 


Vi 


*0 


(State  of  information  of)  source  A . 

(States  of  information  of)  sources  A  through  K.  Defying  standard  alpha¬ 
betic  enumeration  we  assume  there  are  k  such  sources. 

A’s  fractiles,  i.e.,  (<*j,  ....  am)  on  x0 

(State  of  information  of)  source  B. 

B’s  fractiles,  i.e.,  (6lt  .  .  .  ,  bm)  on  j'0. 

The  operator’s  marginal  probability  distribution  function  for  some  random 
variable. 

The  operator’s  marginal  probability  density  function  for  some  random 
variable. 

Form  assumtion  1,  2  and  3,  respectively. 

Calibration  indicator  or  variable  number,  *=0,1,  .  .  .  ,n. 

Inference  assumption  1,  2  and  3,  respectively. 

1-dimensional  row-vector  (actually,  just  an  ordered  set)  ;*]. 

(State  of  information  of)  source  K. 

Source  K's  fractiles. 

Number  of  sources. 

Number  of  fractiles  elicited  from  each  source  per  variable. 

Number  of  calibration  indicators  and  calibration  variables. 

The  operator’s  probability  measure. 

(m+l)*-array  containing  the  parameters  of  the  Dirichlet  distribution. 
jtk  element  of  above  array. 

An  outcome  of  the  decision  variable  x0. 

Decision  variable. 

Outcome  of  calibration  variable  xit  *=1 ,2,  .  .  .  ,n. 

Calibration  variable,  *=1,2,  .  .  .  ,n. 

Outcome  of  y 

(m+l)*-array  containing  the  counts  of  the  hits  in  the  various  bins  at  the 
n<4  calibration  indicator. 

Amount  of  hits  in  bin  j  at  the  n(*  calibration  indicator. 

Outcome  of  the  calibration  indicator  of  the  decision  variable. 

Calibration  indicator  of  the  decision  variable. 

Outcome  of  calibration  indicator  ij,  *=1,2,  .  .  .  ,n. 
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Calibration  indicator,  «=1,  .  .  .  ,n. 

0O 

Gamma  function,  i.e.,  T(j:)  =  f 

o 

Outcome  of  ff. 

(m+l)*-array  containing  the  parameters  of  the  multinomial  model  describ¬ 
ing  the  stochastics  of  true  values  falling  in  bins. 
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